[Bf-docboard] Project Eddy
Tobias Heinke
heinke.tobias at t-online.de
Tue Jan 23 23:49:20 CET 2018
Hi all,
I turned the regex checks I've done in the past into py-scripts.
When these checks have a lot of false positives it's only feasible to
apply them to new content.
So had the idea to apply them on the diffs instead. The goal is to
check/filter changes of the manual.
Both incoming and outgoing, so it can be run by a single person.
The first type of check is lint, of which the most important one is to
prevent leaked markup and
also to insure that only the manuals RST sub standard (style guide) is used.
Spelling mistakes that pass normal spell-checking like 'mash' instead of
'mesh' or
words that are not used yet in the manual are likely to be misspelled
like 'decease' vs. 'decrease'
and code style like: double spaces, and spaces at the end of line, etc.
The benefits are quite obvious:
- Errors are prevented.
- When these errors aren't committed the versioning gets cleaner.
- Manual clean up haven't be done so often.
- The commit itself haven't to be manually checked for these kind of
common errors.
Writing the script was easy, because luckily svn_commit.py (by anfelor)
does almost the same. The svn script itself is quite simple (100 lines).
It checks the svn status and makes a diff of the modified files. The
tools output is filtered for occurrences on lines that start with a '+'.
I expanded the rst-helper to iterate over almost every rst-construct or
to remove it (to prevent false positives).
The tools have a common input schema and use the same output format.
I think it's worth adding it to the tools folder, but it needs a utility
(stemmer) and data.
It has to be finished, polished, and tested anyway and that will take
some time.
Outlook:
The interface that selects subsets of checks has to be finished.
Preventing false positives are either computational expensive or
challenging to manage (almost like parsing).
A final feature could be automatic fixes of 100%-ers and a y/n console
interface.
And also the tools in the [tools] folder can be adapted.
Finally some numbers:
- 7 tools (e.g. char count of headline underlines, indention, etc.)
- 45 reg. exp. (which as groups function as a tool e.g. to prevent
leaked markup)
- 5 lists (confusable words, Blender UI, British English, domains of
external links, comma phrases)
- 2 utility tools (stemmer, fuzzy string matching)
Tobias
More information about the Bf-docboard
mailing list