[Bf-committers] developer.blender.org maintenance/outage

Dan McGrath danmcgrath.ca at gmail.com
Wed Jun 19 05:04:23 CEST 2019


Hi,

Just giving you all an update on the issue from earlier today on the
developer.blender.org slow downs and outages.

First of all, the reports of our assimilation into the B.ORG collective
have been greatly exaggerated! :D As I am not one for writing big fancy
professional reports, I will try keep it short, and to the point.

Yesterday, Phabricator started to experience slowdown, which was hard to
properly look into, as I was already busy prepping the night before to
replace a server in the data center, which only slowed things down more. A
quick look into the issue showed that the hard drives were being exhausted
with writes. Looking into it a bit more, it seemed that when people visit
the site, the site invokes `git --log` on the commits so that it can be
rendered and displayed to the user. The actual problem would appear to be
that these files go to a directory on disk (synchronously?), which created
the write IOP starvation that we saw.

As a workaround, I have changed the ZFS `sync` setting on this dataset to
disabled, which appears to have relaxed the storm a bit. The directory
these uploads go to is a double hashed directory (./AA/AA/, ./AB/AA/,
./AC/AA, etc.) which totals about 64k directories (OH MY GOD....), so even
doing a `find ./` takes 20 minutes on these systems. We can try to
experiment with putting those files on their own dataset in ZFS, with tuned
recordsizes and properties, but this may not help as much as an SSD, and
more RAM.

For now, I will leave the sync in the disabled state so that Phabricator
isn't bogged down. The problem is that the server it's on, with it's
current setup, can't just have it's drives replaced unless the new drives
are exactly the same, or bigger, than the 2TB HDD's (without reinstall),
and 2TB SSD's aren't exactly ideal on that old clunker of a box! Worse, to
move stuff off there is tricky as it is also our some of our Bacula
storage, which has nowhere to go without moving a lot of stuff around and
maybe adding more hard drives to Proxmox, which takes time to setup.

Anyway, that is all details for me and the crew to bang out. We will try
keep an eye on things. Sorry for the delays in your bug reports!


Cheers,

Dan McGrath

On Tue, Jun 18, 2019 at 3:35 AM Dan McGrath <danmcgrath.ca at gmail.com> wrote:

> Hi,
>
> It seems that a few hours ago that developer.blender.org became horribly
> slow and unusable. While the exact cause is still to be determined, the
> HTTP logs were tossing an excessive amount of errors about unsafe strings.
>
> Sergey is en route to the data center for some planned maintenance
> (replace a server), but has already queued up some git commits to help
> address some of the issues with the PHP errors, and plans to poke at it
> some more once we get things sorted out.
>
> Sorry for the inconvenience!
>
>
> Cheers,
>
> Dan McGrath
>


More information about the Bf-committers mailing list