[Bf-blender-cvs] [213cd39b6db] master: OBJ: further optimize, cleanup and harden the new C++ importer

Sun Apr 17 21:07:54 CEST 2022

Commit: 213cd39b6db387bd88f12589fd50ff0e6563cf56
Author: Aras Pranckevicius
Date:   Sun Apr 17 22:07:43 2022 +0300
Branches: master
https://developer.blender.org/rB213cd39b6db387bd88f12589fd50ff0e6563cf56

OBJ: further optimize, cleanup and harden the new C++ importer

Continued improvements to the new C++ based OBJ importer.

Performance: about 2x faster.
- Rungholt.obj (several meshes, 263MB file): Windows 12.7s -> 5.9s, Mac 7.7s -> 3.1s.
- Blender 3.0 splash (24k meshes, 2.4GB file): Windows 97.3s -> 53.6s, Mac 137.3s -> 80.0s.
- "Windows" is VS2022, AMD Ryzen 5950X (32 threads), "Mac" is Xcode/clang 13, M1Max (10 threads).
- Slightly reduced memory usage during import as well.

The performance gains are a combination of several things:
- Replacing `std::stof` / `std::stoi` with C++17 `from_chars`.
- Stop reading input file char-by-char using `std::getline`, and instead read in 64kb chunks, and parse from there (taking care of possibly handling lines split mid-way due to chunk boundaries).
- Removing abstractions for splitting a line by some char,
- Avoid tiny memory allocations: instead of storing a vector of polygon corners in each face, store all the corners in one big array, and per-face only store indices "where do corners start, and how many". Likewise, don't store full string names of material/group names for each face; only store indices into overall material/group names arrays.
- Stop always doing mesh validation, which is slow. Do it just like the Alembic importer does: only do validation if found some invalid faces during import, or if requested by the user via an import setting checkbox (which defaults to off).
- Stop doing "collection sync" for each object being added; instead do the collection sync right after creating all the objects.

Cleanup / Robustness:

This reworking of parser (see "removing abstractions" point above) means that all the functions that were in `parser_string_utils` file are gone, and replaced with different set of functions. However they are not OBJ specific, so as pointed out during review of the previous differential, they are now in `source/blender/io/common` library.

Added gtest coverage for said functions as well; something that was only indirectly covered by obj tests previously.

Rework of some bits of parsing made the parser actually better able to deal with invalid syntax. E.g. previously, if a face corner were a `/123` string, it would have incorrectly treated that as a vertex index (since it would get "hey that's one number" after splitting a string by a slash), instead of properly marking it as invalid syntax.

Added gtest coverage for .mtl parsing; something that was not covered by any tests at all previously.

Reviewed By: Howard Trickey
Differential Revision: https://developer.blender.org/D14586

===================================================================

A	extern/fast_float/LICENSE-MIT
A	extern/fast_float/README.blender
A	extern/fast_float/README.md
A	extern/fast_float/fast_float.h
M	source/blender/editors/io/io_obj.c
M	source/blender/io/common/CMakeLists.txt
A	source/blender/io/common/IO_string_utils.hh
A	source/blender/io/common/intern/string_utils.cc
A	source/blender/io/common/intern/string_utils_test.cc
M	source/blender/io/wavefront_obj/CMakeLists.txt
M	source/blender/io/wavefront_obj/IO_wavefront_obj.h
M	source/blender/io/wavefront_obj/importer/obj_import_file_reader.cc
M	source/blender/io/wavefront_obj/importer/obj_import_file_reader.hh
M	source/blender/io/wavefront_obj/importer/obj_import_mesh.cc
M	source/blender/io/wavefront_obj/importer/obj_import_mesh.hh
M	source/blender/io/wavefront_obj/importer/obj_import_mtl.cc
M	source/blender/io/wavefront_obj/importer/obj_import_mtl.hh
M	source/blender/io/wavefront_obj/importer/obj_import_objects.hh
M	source/blender/io/wavefront_obj/importer/obj_importer.cc
M	source/blender/io/wavefront_obj/importer/obj_importer.hh
D	source/blender/io/wavefront_obj/importer/parser_string_utils.cc
D	source/blender/io/wavefront_obj/importer/parser_string_utils.hh
M	source/blender/io/wavefront_obj/tests/obj_importer_tests.cc
A	source/blender/io/wavefront_obj/tests/obj_mtl_parser_tests.cc

===================================================================

diff --git a/extern/fast_float/LICENSE-MIT b/extern/fast_float/LICENSE-MIT
new file mode 100644
index 00000000000..2fb2a37ad7f
--- /dev/null
+++ b/extern/fast_float/LICENSE-MIT
@@ -0,0 +1,27 @@
+MIT License
+
+Copyright (c) 2021 The fast_float authors
+
+Permission is hereby granted, free of charge, to any
+person obtaining a copy of this software and associated
+documentation files (the "Software"), to deal in the
+Software without restriction, including without
+limitation the rights to use, copy, modify, merge,
+publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software
+is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice
+shall be included in all copies or substantial portions
+of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
+ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
+TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
+PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
+SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
+IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+DEALINGS IN THE SOFTWARE.
diff --git a/extern/fast_float/README.blender b/extern/fast_float/README.blender
new file mode 100644
index 00000000000..a584a0511ee
--- /dev/null
+++ b/extern/fast_float/README.blender
@@ -0,0 +1,7 @@
+Project: fast_float
+URL: https://github.com/fastfloat/fast_float
+License: MIT
+Upstream version: 3.4.0 (b7f9d6c)
+Local modifications:
+
+- Took only the fast_float.h header and the license/readme files
diff --git a/extern/fast_float/README.md b/extern/fast_float/README.md
new file mode 100644
index 00000000000..1e1c06d0a3e
--- /dev/null
+++ b/extern/fast_float/README.md
@@ -0,0 +1,218 @@
+## fast_float number parsing library: 4x faster than strtod
+
+![Ubuntu 20.04 CI (GCC 9)](https://github.com/lemire/fast_float/workflows/Ubuntu%2020.04%20CI%20(GCC%209)/badge.svg)
+![Ubuntu 18.04 CI (GCC 7)](https://github.com/lemire/fast_float/workflows/Ubuntu%2018.04%20CI%20(GCC%207)/badge.svg)
+![Alpine Linux](https://github.com/lemire/fast_float/workflows/Alpine%20Linux/badge.svg)
+![MSYS2-CI](https://github.com/lemire/fast_float/workflows/MSYS2-CI/badge.svg)
+![VS16-CLANG-CI](https://github.com/lemire/fast_float/workflows/VS16-CLANG-CI/badge.svg)
+[![VS16-CI](https://github.com/fastfloat/fast_float/actions/workflows/vs16-ci.yml/badge.svg)](https://github.com/fastfloat/fast_float/actions/workflows/vs16-ci.yml)
+
+The fast_float library provides fast header-only implementations for the C++ from_chars
+functions for `float` and `double` types.  These functions convert ASCII strings representing
+decimal values (e.g., `1.3e10`) into binary types. We provide exact rounding (including
+round to even). In our experience, these `fast_float` functions many times faster than comparable number-parsing functions from existing C++ standard libraries.
+
+Specifically, `fast_float` provides the following two functions with a C++17-like syntax (the library itself only requires C++11):
+
+```C++
+from_chars_result from_chars(const char* first, const char* last, float& value, ...);
+from_chars_result from_chars(const char* first, const char* last, double& value, ...);
+```
+
+The return type (`from_chars_result`) is defined as the struct:
+```C++
+struct from_chars_result {
+    const char* ptr;
+    std::errc ec;
+};
+```
+
+It parses the character sequence [first,last) for a number. It parses floating-point numbers expecting
+a locale-independent format equivalent to the C++17 from_chars function. 
+The resulting floating-point value is the closest floating-point values (using either float or double), 
+using the "round to even" convention for values that would otherwise fall right in-between two values.
+That is, we provide exact parsing according to the IEEE standard.
+
+
+Given a successful parse, the pointer (`ptr`) in the returned value is set to point right after the
+parsed number, and the `value` referenced is set to the parsed value. In case of error, the returned
+`ec` contains a representative error, otherwise the default (`std::errc()`) value is stored.
+
+The implementation does not throw and does not allocate memory (e.g., with `new` or `malloc`).
+
+It will parse infinity and nan values.
+
+Example:
+
+``` C++
+#include "fast_float/fast_float.h"
+#include <iostream>
+ 
+int main() {
+    const std::string input =  "3.1416 xyz ";
+    double result;
+    auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result);
+    if(answer.ec != std::errc()) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
+    std::cout << "parsed the number " << result << std::endl;
+    return EXIT_SUCCESS;
+}
+```
+
+
+Like the C++17 standard, the `fast_float::from_chars` functions take an optional last argument of
+the type `fast_float::chars_format`. It is a bitset value: we check whether 
+`fmt & fast_float::chars_format::fixed` and `fmt & fast_float::chars_format::scientific` are set
+to determine whether we allow the fixed point and scientific notation respectively.
+The default is  `fast_float::chars_format::general` which allows both `fixed` and `scientific`.
+
+The library seeks to follow the C++17 (see [20.19.3](http://eel.is/c++draft/charconv.from.chars).(7.1))  specification. 
+* The `from_chars` function does not skip leading white-space characters.
+* [A leading `+` sign](https://en.cppreference.com/w/cpp/utility/from_chars) is forbidden.
+* It is generally impossible to represent a decimal value exactly as binary floating-point number (`float` and `double` types). We seek the nearest value. We round to an even mantissa when we are in-between two binary floating-point numbers. 
+
+Furthermore, we have the following restrictions:
+* We only support `float` and `double` types at this time.
+* We only support the decimal format: we do not support hexadecimal strings.
+* For values that are either very large or very small (e.g., `1e9999`), we represent it using the infinity or negative infinity value.
+
+We support Visual Studio, macOS, Linux, freeBSD. We support big and little endian. We support 32-bit and 64-bit systems.
+
+
+
+## Using commas as decimal separator
+
+
+The C++ standard stipulate that `from_chars` has to be locale-independent. In
+particular, the decimal separator has to be the period (`.`). However, 
+some users still want to use the `fast_float` library with in a locale-dependent 
+manner. Using a separate function called `from_chars_advanced`, we allow the users
+to pass a `parse_options` instance which contains a custom decimal separator (e.g., 
+the comma). You may use it as follows.
+
+```C++
+#include "fast_float/fast_float.h"
+#include <iostream>
+ 
+int main() {
+    const std::string input =  "3,1416 xyz ";
+    double result;
+    fast_float::parse_options options{fast_float::chars_format::general, ','};
+    auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options);
+    if((answer.ec != std::errc()) || ((result != 3.1416))) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; }
+    std::cout << "parsed the number " << result << std::endl;
+    return EXIT_SUCCESS;
+}
+```
+
+
+## Reference
+
+- Daniel Lemire, [Number Parsing at a Gigabyte per Second](https://arxiv.org/abs/2101.11408), Software: Pratice and Experience 51 (8), 2021.
+
+## Other programming languages
+
+- [There is an R binding](https://github.com/eddelbuettel/rcppfastfloat) called `rcppfastfloat`.
+- [There is a Rust port of the fast_float library](https://github.com/aldanor/fast-float-rust/) called `fast-float-rust`.
+- [There is a Java port of the fast_float library](https://github.com/wrandelshofer/FastDoubleParser) called `FastDoubleParser`.
+- [There is a C# port of the fast_float library](https://github.com/CarlVerret/csFastFloat) called `csFastFloat`.
+
+
+## Relation With Other Work
+
+The fastfloat algorithm is part of the [LLVM standard libraries](https://github.com/llvm/llvm-project/commit/87c016078ad72c46505461e4ff8bfa04819fe7ba). 
+
+The fast_float library provides a performance similar to that of the [fast_double_parser](https://github.com/lemire/fast_double_parser) library but using an updated algorithm reworked from the ground up, and while offering an API more in line with the expectations of C++ programmers. The fast_double_parser library is part of the [Microsoft LightGBM machine-learning framework](https://github.com/microsoft/LightGBM).
+
+## Users
+
+The fast_float library is used by [Apache Arrow](https://github.com/apache/arrow/pull/8494) where it multiplied the number parsing speed by two or three times. It is also used by [Yandex ClickHouse](https://github.com/ClickHouse/ClickHouse) and by [Google Jsonnet](https://github.com/google/jsonnet).
+
+
+## How fast is it?
+
+It can parse random floating-point numbers at a speed of 1 GB/s on some systems. We find that it is often twice as fast as the best available competitor, and many times faster than many standard-library implementations.
+
+<img src="http://lemire.me/blog/wp-content/uploads/2020/11/fastfloat_speed.png" width="400">
+
+```
+$ ./build/benchmarks/benchmark 
+# parsing random integers in the range [0,1)
+volume = 2.09808 MB 
+netlib                                  :   271.18 MB/s (+/- 1.2 %)    12.93 Mfloat/s  
+doubleconversion                        :   225.35 MB/s (+/- 1.2 %)    10.74 Mfloat/s  
+strtod                                  :   190.94 MB/s (+/- 1.6 %)     9.10 Mfloat/s  
+abseil                                  :   430.45 MB/s (+/- 2.2 %)    20.52 Mfloat/s  
+fastfloat                               :  1042.38 MB/s (+/- 9.9 %)    49.68 Mfloat/s  
+```
+
+See https://github.com/lemire/simple_fastfloat_benchmark for our benchmarking code.
+
+
+## Video
+
+[![Go Systems 2020](http://img.youtube.com/vi/AVXgvlMeIm4/0.jpg)](http://www.youtube.com/watch?v=AVXgvlMeIm4)<br />
+
+## Using as a CMake dependency
+
+This library is header-only by design. The CMake file provides the `fast_float` target
+which is merely a pointer 

@@ Diff output truncated at 10240 characters. @@