[Bf-docboard] Bf-docboard Digest, Vol 92, Issue 1

Dan McGrath danmcgrath.ca at gmail.com
Mon Oct 8 18:58:10 CEST 2012


I figured that rather than replying to a bunch of seperate emails, I
would answer this one, and also expand it by giving any future admins
a bit of insight into the setup where possible.

On Mon, Oct 8, 2012 at 8:14 AM, Kesten Broughton
<solarmobiletrailers at gmail.com> wrote:
> 1) Media wiki needs upgrade, the current one seems to be instable
> by media wiki, do you mean wiki.blender.org?
> what instabilities have been found?

As Brecht discusses in the next email, I think it is more of a problem
about the system not being maintained, as far as updates go, on a
regular basis.

> 2) If you type a url, the system now demands a captcha for people who are
> logged in even. Can this be disabled?
> For blender, devs can check changes in the svn repository to find out what
> change broke stuff.  Is there a similar code base for the wiki?

This is a bit of a pain spot (at least I found). Essentially, there
are 2 svn repositories. One is a public one that anyone can access I
believe. The second, is a private SVN repo that mindrones was using to
actually track the live changes to the wiki code base by manually
cherry picking changes I would push to the public one, and then push
them into the private one, and then pull those changes into the actual
production (or test) server.

The biggest pain spot with the private repository though, is that it
only contains a small portion of the actual wiki that you deploy into
the production server. So as you guys can imagine, you get a ton of
"?" listings every time you do an `svn status` as it would appear that
no ignore entries are/were maintained. Also, the private SVN was not
always in sync. As a result, getting actuall changes deployed involved
a lot of hoops to jump through.

As for the databases, there are 2 mysql db's with names that iirc were
designed based on the type of db (production vs test) with a
concatenated timestamp as to when they were put in place.
Unfortunately, the databases themselves we not always in sync, and
since a lot of the "code" of the wiki is done via templates, the
result is that not all changes always work as expected when you deploy
them on the test server.

To top it all off, we have the extreme complication of the Sphinx
indexing system that is used for searches. For those who don't know,
Sphinx is an external 3rd party indexing server that you point at a
database, and then through configuration files, explain to it what
fields in the tables are to be indexed, how to index them, if they are
an enum type or a string, or even a timestamp etc. Even though Sphinx
can index massive amounts of data just fine, the problem is the
interface to the wiki itself.

As you can imagine, in order to use Sphinx in MediaWiki itself,
someone had to write an extension that can access and query the Sphinx
API from within the wiki software. While I don't recall the specifics,
I don't think that the wiki extension itself was written by the same
people that wrote Sphinx (although them might have had some
involvement, I am not entirely sure tbh). Regardless, we found early
on that the extension itself was rather lackluster overall in terms of
functionality, features, configuration and interface that is used in
the actual wiki for searching.

The underlying problem with the search seemed to stem from some
decisions that were made with regards to i18n/l10n in the early days
of the wiki. I don't remember all of the specifics atm, but my
understanding is that in a "normal" MW install (like wikipedia), they
use something called inter-language links that point to an actual
seperate install of the wiki. Our install however, lumped everything
into one massive installed, and used subpages and whatnot to seperate
all of the languages. There is a little more to all of the specifics,
but you hopefully get the idea.

So, to get Sphinx running and support the 20 or 30 or so languages
that we wanted to offer in a single installations, we first had to
reorganize (or "normalize", in DB lingo) the structure of the wiki so
that we could isolate the individual pages in the wiki for Sphinx to
index. Since we didn't use the inter language links but instead used
the page title (ie: EN/MyPage/foo or FR/MyPage/foo), and that the page
titles were all in english regardless of language etc, we had no
access to the meta data (in this case, the language) directly. Thus
the reorganization so that we ensured that IK (a language, and a
feature in blender!) had to all be moved around to ensure that we
could tell sphinx via some SQL functions that parse page titles, what
language and version number of blender the page is so that we could
use this data, which was not available otherwise, in actual searches.
(Version, is what we ended up called "series", and just implies if it
is 2.4, 2.6 etc).

To complicate things even further, we had to deal with updates to the
Sphinx config files for dozens and dozens of languages since for
indexing to function properly, you have to tell it what language the
text it is scanning is in (remember, our particular wiki setup has NO
idea due to the way things are setup, and the meta data is only
avaiable via page title). This proved to be a problem initially (many
lost hours and days due to simple typos!) since changing SQL in 100+
places (30 or 40 language stanzas, times 2 changes for
full/incremental index update SQL) was very error prone. In the end, I
ended up using M4 macros as a template system so that I could just add
a new language that wasn't so prone to typos, and then regenerate the
massive config files for the sphinx process.

As for sphinx extension in mediawiki, the stock search system was not
able to query or display the results in a fashion that we felt was
ideal for the pages. For example, we not had implied meta data in the
page titles, but the internal search engine had no idea about any of
this, so it would tend to just search everything (not that it was
slow, just inefficient). The internal search engine itself could have
been extended to deal with the exact setup we use for page titles, but
since the admin and mindrones were already down the path of Sphinx
when I came along, we stuck with it.

As for the extension itself, I ultimately ended up hacking it (meant
to be temp/short term) so that it could handle queries via http GET to
display particular languages and/or versions (series), in addition to
the typical namespaces that MW supports. Also, since the team made the
decision to use a fancy tabbed layout and have the possibility to
display all series at the same time on a search result page, I had to
change the extension so that it could do this, since by default,
Sphinx currently did not allow this type of search "grouping" (it
actually did, but only 3 results per group as a limit).

The result of all this is that the search system became an overly
complex (and resource intensive) system that needed a full proper
rewrite, that ended up "working", but is difficult to maintain since
it requires a large amount of access to be coordinated with the system
administrator. Combine that was the VCS setup that is used, and even
updating the wiki itself can be a lesson in frustration ;)

So, while I can't speak for the future of the wiki as far as search
goes, I will say this; whom ever wishes to get involved with the long
term maintenance of the system, needs to understand what they are
getting into. Ultimately, I think the system should be refreshed to
something more modern. Things like configuration management (Puppet,
Chef, CFEngine) come to mind, or perhaps mini cluster of vm's on a
single box to deal with test -> production (since many mistakes and
frustrations were caused by having both on the same box due to
external processes using the same port binds etc), but all of this
requires time, commitment and proper planning.

Anyways, I hate to hit all of you with my wall-o-text's :) but I
figure that it was important for me to explain this at least before
some poor soul goes walking into some fresh new hell ;) As usual, feel
free to contact me if you are curious or need some clarification. I
think the server as a whole is running just fine thanks to Marco (the
sys admin), and I am sure that between Luca, Marco and myself, would
could answer most questions about the setup for anyone interested in
helping out.


Dan




>
> kesten
>>
>>
>> Hi,
>>
>> We need a new system admin to help organizing wiki. Two issues that popped
>> up today at the dev meeting:
>>
>
>>
>> 2) If you type a url, the system now demands a captcha for people who are
>> logged in even. Can this be disabled?
>>
>>
>> Thanks,
>>
>> -Ton-
>>
>> ------------------------------------------------------------------------
>> Ton Roosendaal  Blender Foundation   ton at blender.org    www.blender.org
>> Blender Institute   Entrepotdok 57A  1018AD Amsterdam   The Netherlands
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Bf-docboard mailing list
>> Bf-docboard at blender.org
>> http://lists.blender.org/mailman/listinfo/bf-docboard
>>
>>
>> End of Bf-docboard Digest, Vol 92, Issue 1
>> ******************************************
>
>
>
>
> --
>
> Kesten Broughton
> President and Technology Director,
> Solar Mobile Trailers
> kesten at solarmobiletrailers.com
> www.sunfarmkitchens.ca
> 512 701 4209
>
>
> _______________________________________________
> Bf-docboard mailing list
> Bf-docboard at blender.org
> http://lists.blender.org/mailman/listinfo/bf-docboard
>


More information about the Bf-docboard mailing list