Gitlab & Drone intermittent service/outage

Incident Report for ACP

Resolved

Service appears to have stabilised after minor modifications and stopping of some request-heavy processes. Documentation has been relocated and updated to reflect recent troubleshooting.
Posted Jun 23, 2025 - 14:43 BST

Update

Identified and requested stop of a process scanning gitlab regularly and many times per second
No 500s reported or noted in the logs since 15:19, Gitlab Service appears to be restored
Posted Jun 19, 2025 - 17:00 BST

Investigating

We are again receiving reports of impacted performance, and observations of 500 status code errors from the logs
Posted Jun 19, 2025 - 16:39 BST

Monitoring

Following the restart, Gitlab appears to be back to normal functionality - currently leaving Debug logs enabled for further monitoring of service performance
Posted Jun 19, 2025 - 12:37 BST

Investigating

Rolling out a fresh deployment of the Gitlab and restarting Redis to clear the cache, to hopefully restore normal service while investigation continues
Posted Jun 19, 2025 - 11:55 BST

Monitoring

Root Cause as yet unidentified
Posted Jun 19, 2025 - 11:04 BST

Investigating

Reports of Gitlab inaccessible giving error 500, as well as automated Gitlab and Drone e2e tests firing repeatedly

Currently investigating root cause of this outage
Posted Jun 18, 2025 - 14:43 BST
This incident affected: Drone Gitlab and Gitlab.