[Bf-docboard] Project Eddy

Tue Jan 23 23:49:20 CET 2018

Hi all,

I turned the regex checks I've done in the past into py-scripts.
When these checks have a lot of false positives it's only feasible to 
apply them to new content.
So had the idea to apply them on the diffs instead. The goal is to 
check/filter changes of the manual.
Both incoming and outgoing, so it can be run by a single person.

The first type of check is lint, of which the most important one is to 
prevent leaked markup and
also to insure that only the manuals RST sub standard (style guide) is used.
Spelling mistakes that pass normal spell-checking like 'mash' instead of 
'mesh' or
words that are not used yet in the manual are likely to be misspelled 
like 'decease' vs. 'decrease'
and code style like: double spaces, and spaces at the end of line, etc.

The benefits are quite obvious:
- Errors are prevented.
- When these errors aren't committed the versioning gets cleaner.
- Manual clean up haven't be done so often.
- The commit itself haven't to be manually checked for these kind of 
common errors.

Writing the script was easy, because luckily svn_commit.py (by anfelor) 
does almost the same. The svn script itself is quite simple (100 lines).
It checks the svn status and makes a diff of the modified files. The 
tools output is filtered for occurrences on lines that start with a '+'.

I expanded the rst-helper to iterate over almost every rst-construct or 
to remove it (to prevent false positives).
The tools have a common input schema and use the same output format.

I think it's worth adding it to the tools folder, but it needs a utility 
(stemmer) and data.
It has to be finished, polished, and tested anyway and that will take 
some time.

Outlook:
The interface that selects subsets of checks has to be finished.
Preventing false positives are either computational expensive or 
challenging to manage (almost like parsing).
A final feature could be automatic fixes of 100%-ers and a y/n console 
interface.
And also the tools in the [tools] folder can be adapted.

Finally some numbers:
- 7 tools (e.g. char count of headline underlines, indention, etc.)
- 45 reg. exp. (which as groups function as a tool e.g. to prevent 
leaked markup)
- 5 lists (confusable words, Blender UI, British English, domains of 
external links, comma phrases)
- 2 utility tools (stemmer, fuzzy string matching)

Tobias