[Bf-blender-cvs] [ae94e36cfb2] master: Geometry Nodes: refactor array devirtualization

Jacques Lucke noreply at git.blender.org
Tue Apr 26 17:14:41 CEST 2022


Commit: ae94e36cfb2f3bc9a99b638782092d9c71d4b3c7
Author: Jacques Lucke
Date:   Tue Apr 26 17:12:34 2022 +0200
Branches: master
https://developer.blender.org/rBae94e36cfb2f3bc9a99b638782092d9c71d4b3c7

Geometry Nodes: refactor array devirtualization

Goals:
* Better high level control over where devirtualization occurs. There is always
  a trade-off between performance and compile-time/binary-size.
* Simplify using array devirtualization.
* Better performance for cases where devirtualization wasn't used before.

Many geometry nodes accept fields as inputs. Internally, that means that the
execution functions have to accept so called "virtual arrays" as inputs. Those
 can be e.g. actual arrays, just single values, or lazily computed arrays.
Due to these different possible virtual arrays implementations, access to
individual elements is slower than it would be if everything was just a normal
array (access does through a virtual function call). For more complex execution
functions, this overhead does not matter, but for small functions (like a simple
addition) it very much does. The virtual function call also prevents the compiler
from doing some optimizations (e.g. loop unrolling and inserting simd instructions).

The solution is to "devirtualize" the virtual arrays for small functions where the
overhead is measurable. Essentially, the function is generated many times with
different array types as input. Then there is a run-time dispatch that calls the
best implementation. We have been doing devirtualization in e.g. math nodes
for a long time already. This patch just generalizes the concept and makes it
easier to control. It also makes it easier to investigate the different trade-offs
when it comes to devirtualization.

Nodes that we've optimized using devirtualization before didn't get a speedup.
However, a couple of nodes are using devirtualization now, that didn't before.
Those got a 2-4x speedup in common cases.
* Map Range
* Random Value
* Switch
* Combine XYZ

Differential Revision: https://developer.blender.org/D14628

===================================================================

M	source/blender/blenkernel/intern/curve_to_mesh_convert.cc
M	source/blender/blenkernel/intern/type_conversions.cc
A	source/blender/blenlib/BLI_devirtualize_parameters.hh
A	source/blender/blenlib/BLI_parameter_pack_utils.hh
M	source/blender/blenlib/BLI_virtual_array.hh
M	source/blender/blenlib/CMakeLists.txt
M	source/blender/functions/FN_multi_function_builder.hh
M	source/blender/functions/FN_multi_function_param_type.hh
M	source/blender/functions/FN_multi_function_params.hh
M	source/blender/functions/FN_multi_function_signature.hh
M	source/blender/functions/intern/multi_function.cc
M	source/blender/functions/intern/multi_function_procedure.cc
M	source/blender/functions/intern/multi_function_procedure_executor.cc
M	source/blender/geometry/intern/mesh_to_curve_convert.cc
M	source/blender/nodes/NOD_math_functions.hh
M	source/blender/nodes/function/nodes/node_fn_boolean_math.cc
M	source/blender/nodes/function/nodes/node_fn_compare.cc
M	source/blender/nodes/function/nodes/node_fn_float_to_int.cc
M	source/blender/nodes/function/nodes/node_fn_random_value.cc
M	source/blender/nodes/geometry/nodes/node_geo_curve_resample.cc
M	source/blender/nodes/geometry/nodes/node_geo_curve_sample.cc
M	source/blender/nodes/geometry/nodes/node_geo_extrude_mesh.cc
M	source/blender/nodes/geometry/nodes/node_geo_mesh_to_points.cc
M	source/blender/nodes/geometry/nodes/node_geo_switch.cc
M	source/blender/nodes/shader/nodes/node_shader_map_range.cc
M	source/blender/nodes/shader/nodes/node_shader_math.cc
M	source/blender/nodes/shader/nodes/node_shader_sepcomb_xyz.cc
M	source/blender/nodes/shader/nodes/node_shader_vector_math.cc

===================================================================

diff --git a/source/blender/blenkernel/intern/curve_to_mesh_convert.cc b/source/blender/blenkernel/intern/curve_to_mesh_convert.cc
index 0dc46f87537..88e0bae0c13 100644
--- a/source/blender/blenkernel/intern/curve_to_mesh_convert.cc
+++ b/source/blender/blenkernel/intern/curve_to_mesh_convert.cc
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 
 #include "BLI_array.hh"
+#include "BLI_devirtualize_parameters.hh"
 #include "BLI_set.hh"
 #include "BLI_task.hh"
 
diff --git a/source/blender/blenkernel/intern/type_conversions.cc b/source/blender/blenkernel/intern/type_conversions.cc
index 1c2665769db..0b5d6ad7b10 100644
--- a/source/blender/blenkernel/intern/type_conversions.cc
+++ b/source/blender/blenkernel/intern/type_conversions.cc
@@ -24,7 +24,8 @@ static void add_implicit_conversion(DataTypeConversions &conversions)
       conversion_name.c_str(),
       /* Use lambda instead of passing #ConversionF directly, because otherwise the compiler won't
        * inline the function. */
-      [](const From &a) { return ConversionF(a); }};
+      [](const From &a) { return ConversionF(a); },
+      fn::CustomMF_presets::AllSpanOrSingle()};
   static auto convert_single_to_initialized = [](const void *src, void *dst) {
     *(To *)dst = ConversionF(*(const From *)src);
   };
diff --git a/source/blender/blenlib/BLI_devirtualize_parameters.hh b/source/blender/blenlib/BLI_devirtualize_parameters.hh
new file mode 100644
index 00000000000..bf4f6c47cfe
--- /dev/null
+++ b/source/blender/blenlib/BLI_devirtualize_parameters.hh
@@ -0,0 +1,309 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#pragma once
+
+/** \file
+ * \ingroup bli
+ *
+ * In geometry nodes, many functions accept fields as inputs. For the implementation that means
+ * that the inputs are virtual arrays. Usually those are backed by actual arrays or single values
+ * but sometimes virtual arrays are used to compute values on demand or convert between data
+ * formats.
+ *
+ * Using virtual arrays has the downside that individual elements are accessed through a virtual
+ * method call, which has some overhead compared to normal array access. Whether this overhead is
+ * negilible depends on the context. For very small functions (e.g. a single addition), the
+ * overhead can make the function many times slower. Furthermore, it prevents the compiler from
+ * doing some optimizations (e.g. loop unrolling and inserting SIMD instructions).
+ *
+ * The solution is to "devirtualize" the virtual arrays in cases when the overhead cannot be
+ * ignored. That means that the function is instantiated multiple times at compile time for the
+ * different cases. For example, there can be an optimized function that adds a span and a single
+ * value, and another function that adds a span and another span. At run-time there is a dynamic
+ * dispatch that executes the best function given the specific virtual arrays.
+ *
+ * The problem with this devirtualization is that it can result in exponentially increasing compile
+ * times and binary sizes, depending on the number of parameters that are devirtualized separately.
+ * So there is always a trade-off between run-time performance and compile-time/binary-size.
+ *
+ * This file provides a utility to devirtualize array parameters to a function using a high level
+ * API. This makes it easy to experiment with different extremes of the mentioned trade-off and
+ * allows finding a good compromise for each function.
+ */
+
+#include "BLI_parameter_pack_utils.hh"
+#include "BLI_virtual_array.hh"
+
+namespace blender::devirtualize_parameters {
+
+/**
+ * Bit flag that specifies how an individual parameter is or can be devirtualized.
+ */
+enum class DeviMode {
+  /* This is used as zero-value to compare to, to avoid casting to int. */
+  None = 0,
+  /* Don't use devirtualization for that parameter, just pass it along. */
+  Keep = (1 << 0),
+  /* Devirtualize #Varray as #Span. */
+  Span = (1 << 1),
+  /* Devirtualize #VArray as #SingleAsSpan.  */
+  Single = (1 << 2),
+  /* Devirtualize #IndexMask as #IndexRange. */
+  Range = (1 << 3),
+};
+ENUM_OPERATORS(DeviMode, DeviMode::Range);
+
+/** Utility to encode multiple #DeviMode in a type. */
+template<DeviMode... Mode> using DeviModeSequence = ValueSequence<DeviMode, Mode...>;
+
+/**
+ * Main class that performs the devirtualization.
+ */
+template<typename Fn, typename... SourceTypes> class Devirtualizer {
+ private:
+  /** Utility to get the tag of the I-th source type. */
+  template<size_t I>
+  using type_at_index = typename TypeSequence<SourceTypes...>::template at_index<I>;
+  static constexpr size_t SourceTypesNum = sizeof...(SourceTypes);
+
+  /** Function to devirtualize. */
+  Fn fn_;
+
+  /**
+   * Source values that will be devirtualized. Note that these are stored as pointers to avoid
+   * unnecessary copies. The caller is responsible for keeping the memory alive.
+   */
+  std::tuple<const SourceTypes *...> sources_;
+
+  /** Keeps track of whether #fn_ has been called already to avoid calling it twice. */
+  bool executed_ = false;
+
+ public:
+  Devirtualizer(Fn fn, const SourceTypes *...sources) : fn_(std::move(fn)), sources_{sources...}
+  {
+  }
+
+  /**
+   * Return true when the function passed to the constructor has been called already.
+   */
+  bool executed() const
+  {
+    return executed_;
+  }
+
+  /**
+   * At compile time, generates multiple variants of the function, each optimized for a different
+   * combination of devirtualized parameters. For every parameter, a bit flag is passed that
+   * determines how it will be devirtualized. At run-time, if possible, one of the generated
+   * functions is picked and executed.
+   *
+   * To check whether the function was called successfully, call #executed() afterwards.
+   *
+   * \note This generates an exponential amount of code in the final binary, depending on how many
+   * to-be-virtualized parameters there are.
+   */
+  template<DeviMode... AllowedModes>
+  void try_execute_devirtualized(DeviModeSequence<AllowedModes...> /* allowed_modes */)
+  {
+    BLI_assert(!executed_);
+    static_assert(sizeof...(AllowedModes) == SourceTypesNum);
+    return this->try_execute_devirtualized_impl(DeviModeSequence<>(),
+                                                DeviModeSequence<AllowedModes...>());
+  }
+
+  /**
+   * Execute the function and pass in the original parameters without doing any devirtualization.
+   */
+  void execute_without_devirtualization()
+  {
+    BLI_assert(!executed_);
+    this->try_execute_devirtualized_impl_call(
+        make_value_sequence<DeviMode, DeviMode::Keep, SourceTypesNum>(),
+        std::make_index_sequence<SourceTypesNum>());
+  }
+
+ private:
+  /**
+   * A recursive method that generates all the combinations of devirtualized parameters that the
+   * caller requested. A recursive function is necessary to achieve generating an exponential
+   * number of function calls (which has to be used with care, but is expected here).
+   *
+   * At every recursive step, the #DeviMode of one parameter is determined. This is achieved by
+   * extending #DeviModeSequence<Mode...> by one element in each step. The recursion ends once all
+   * parameters are handled.
+   */
+  template<DeviMode... Mode, DeviMode... AllowedModes>
+  void try_execute_devirtualized_impl(
+      /* Initially empty, but then extended by one element in each recursive step.  */
+      DeviModeSequence<Mode...> /* modes */,
+      /* Bit flag for every parameter. */
+      DeviModeSequence<AllowedModes...> /* allowed_modes */)
+  {
+    static_assert(SourceTypesNum == sizeof...(AllowedModes));
+    if constexpr (SourceTypesNum == sizeof...(Mode)) {
+      /* End of recursion, now call the function with the determined #DeviModes. */
+      this->try_execute_devirtualized_impl_call(DeviModeSequence<Mode...>(),
+                                                std::make_index_sequence<SourceTypesNum>());
+    }
+    else {
+      /* Index of the parameter that is checked in the current recursive step. */
+      constexpr size_t I = sizeof...(Mode);
+      /* Non-devirtualized parameter type. */
+      using SourceType = type_at_index<I>;
+      /* A bit flag indicating what devirtualizations are allowed in this step. */
+      constexpr DeviMode allowed_modes = DeviModeSequence<AllowedModes...>::template at_index<I>();
+
+      /* Handle #VArray types. */
+      if constexpr (is_VArray_v<SourceType>) {
+        /* The actual virtual array, used for dynamic dispatch at run-time. */
+        const SourceType &varray = *std::get<I>(sources_);
+        /* Check if the virtual array is a single value. */
+        if constexpr ((allowed_modes & DeviMode::Single) != DeviMode::None) {
+          if (varray.is_single()) {
+            this->try_execute_devirtualized_impl(DeviModeSequence<Mode..., DeviMode::Single>(),
+                                                 DeviModeSequence<AllowedModes...>());
+          }
+        }
+        /* Check if the virtual array is a span. */
+        if constexpr ((allowed_modes & DeviMode::Span) != DeviMode::None) {
+          if (varray.is_span()) {
+            this->try_execute_devirtualized_impl(DeviModeSequence<Mode..., DeviMode::Span>(),
+                                                 DeviModeSequence<AllowedModes...>());
+          }
+        }
+        /* Check if it is ok if the virtual array is not devirtualized. */
+        if constexpr ((allowed_modes & DeviMode::Keep) != DeviMode::None) {
+          this->try_execute_devirtualized_impl(DeviModeSequence<Mode..., DeviMode::Keep>(),
+                                               DeviModeSequence<AllowedModes...>());
+        }
+      }
+
+      /* Handle #IndexMask. */
+      else if constexpr (std::is_same_v<IndexMask, SourceType>) {
+        /* Check if the mask is actually a contiguous range. */
+        if constexpr ((allowed_modes & DeviMode::Range) != DeviMode::None) {
+          /* The actual mask used for dynamic dispatch at run-time. */
+          const IndexMask &mask = *std::get<I>(sources_);
+          if (mask.is_range()) {
+            this->try_execute_devirtualized_impl(DeviModeSequence<Mode..., DeviMode::Range>(),

@@ Diff output truncated at 10240 characters. @@



More information about the Bf-blender-cvs mailing list