This class implements a cache of GPU render pipelines to avoid recreating them each frame.

## Cache implementation
The cache is a node tree with the same structure as the bind group cache:
```typescript
class NodeState {
    public values: { [id: number]: NodeState };
    public pipeline: GPURenderPipeline;

    constructor() {
        this.values = {};
    }
}

export class WebGPUCacheRenderPipelineTree {
    private static _Cache: NodeState = new NodeState();
    ...
}
```
The difference is that the ids encode the pipeline state. As there are numerous states that define a pipeline, we need several ids. Currently, the list is:
```javascript
enum StatePosition {
    StencilReadMask = 0,
    StencilWriteMask = 1,
    DepthBias = 2,
    DepthBiasSlopeScale = 3,
    DepthStencilState = 4,
    MRTAttachments1 = 5,
    MRTAttachments2 = 6,
    RasterizationState = 7,
    ColorStates = 8,
    ShaderStage = 9,
    TextureStage = 10,
    VertexState = 11, // vertex state will consume positions 11, 12, ... depending on the number of vertex inputs

    NumStates = 12
}
```
So, the first node (`_Cache.values`) holds the stencil read mask values, the second node (`_Cache.values[stencilReadMaskValue]`) holds the stencil write mask values, the third node (`_Cache.values[stencilReadMaskValue].values[stencilWriteMaskValue]`) holds the depth bias values, and so on.

To find a pipeline in the cache we simply starts from the root node, lookup the `values` property with the `stencilReadMask` current value, then lookup the `values` property of this node with the `stencilWriteMask` current value and so on, until we traverse all the states.

## Optimization
The state positions are ordered so that states that are less likely to change from one pipeline to another are listed first. That's because we maintain a pointer (`_stateDirtyLowestIndex`) that contains the lowest index of all the states that have been dirtyfied (meaning the states that have changed since the last pipeline lookup) before querying the cache and we will traverse the cache from this index and not from 0 to lookup the pipeline. So, the higher `_stateDirtyLowestIndex` is the better the performances are: we will traverse fewer nodes to find the pipeline in the cache.

Note we tried an implementation where render pipelines where recorded in a hash map and the lookup key was a string concatenation of the state values (see the class `WebGPUCacheRenderPipelineString`) but it was dropped because the node tree implementation was faster (around 2x at the time the comparison was made - the code has changed since then so we should probably perform some new testings but I still think the node tree is faster than the hash map).

Last note: the `values` property is a regular object and not a `Map` because in our testings (with Chrome) using an object was faster.

## Monitoring the performances
Performance of the cache can be assessed by looking at these properties (the property should be prefixed by `BABYLON.WebGPUCacheRenderPipeline.`):

| property | description |
| ---------| ----------- |
| NumCacheHitWithoutHash | Number of times a render pipeline has been retrieved without even traversing the cache because there were no state changes since the last lookup: the last pipeline has been returned |
| NumCacheHitWithHash | Number of times a render pipeline has been retrieved by traversing the cache |
| NumCacheMiss | Number of times a new render pipeline has been created because it was not existing in the cache yet |
| NumPipelineCreationLastFrame | Number of render pipelines created during the last frame - new pipelines should not be created continuously so on average this value should be 0 |



## Validation tests
These validation tests are excluded from WebGPU because some features are not implemented in Babylon.js yet:
* **GLTF Mesh Primitive Mode (0)** and **GLTF Mesh Primitive Mode (1)**: line loop / triangle fan not implemented yet
* **GLTF Buggy with Meshopt Compression**: formats others than `float` for the vertex buffers (*position*, *normal*, *uv*, ...) are not supported yet

When the features are implemented, the corresponding validation tests should be re-enabled.

The **Self shadowing** validation test generates rendering errors (but is still ok because there are less than 2.5% errors) because it uses exponential shadow map whose parameters (`depthScale` especially) depend on the precision of the depth map. In WebGL we are using a 32 bits float texture but in WebGPU it's only a half-float texture because linear filtering of 32 bits float textures are not supported (for the time being at least):

![WebGPU chart](/img/extensions/webgpu/webgpuValidationTestSelfShadowing.webp)

**Note to core developers: YOU SHOULD RUN THE VALIDATION TESTS LOCALLY OFTEN BECAUSE THEY ARE NOT RUN ON THE AZURE SERVERS!**

Run both the:
* WebGPU tests: http://localhost:1338/tests/validation/?list=webgpu&engine=webgpu
* Standard tests: http://localhost:1338/tests/validation/?list=config&engine=webgpu

You should also run them in the "check resource creation" mode: see [Check Resource Creation - WebGPU only
](/contribute/toBabylon/validationTests#check-resource-creation---webgpu-only).

## TintWASM
You should update the **TWGSL** (Tint WASM) module regularly so that it stays in sync with the **Tint** source code.

* TintWASM module: https://github.com/syntheticmagus/twgsl
* Tint repository: https://dawn.googlesource.com/tint



This class implements a cache of GPU bind groups to avoid recreating them each frame.

## Cache implementation
The cache is a node tree:
```typescript
class WebGPUBindGroupCacheNode {
    public values: { [id: number]: WebGPUBindGroupCacheNode };
    public bindGroups: GPUBindGroup[];

    constructor() {
        this.values = {};
    }
}

export class WebGPUCacheBindGroups {
    private static _Cache: WebGPUBindGroupCacheNode = new WebGPUBindGroupCacheNode();
    ...
}
```
The `id` key in the `values` object is the id of a bind group resource: a uniform/storage buffer, a sampler or a texture. The `id` value for the uniform/storage buffer and texture is simply the `uniqueId` property of the corresponding class (`DataBuffer.uniqueId` and `InternalTexture.uniqueId / ExternalTexture.uniqueId` respectively). For the sampler, it is the sampler hash code (computed by `WebGPUCacheSampler.GetSamplerHashCode()`). The cache is traversed/built by looping over all the buffers/samplers/textures used by a shader (which is encapsulated in a `WebGPUPipelineContext`), in this order.

## Limits of the implementation
The location of a resource (group and binding indices in the `[[group(G), binding(B)]]` syntax) is not factored in the id and the ids are not *globally* unique (because they are not from the same pool: there's a separate pool for the buffer and texture `uniqueId` property), so theoritically some collisions could occur where two sets of resources point to the same cache entry. In practice it will likely never occur. Making the cache foolproof would mean making it even slower and the WebGPU implementation already suffers a lot from having to handle a cache for some objects (bind groups and render pipelines mainly)...

Note also that all uniform buffers have an offset of 0 in Babylon and we don't have a use case where we would have the same buffer used with different capacity values: that means we don't need to take into account the offset/size of the buffer in the cache, only the id.

## Optimization
There is an optimization of the cache where we simply return the existing bind groups if the draw and material contexts did not change since the last cache query. Indeed, the draw context holds the list of the uniform/storage buffers and the material context the list of the textures and samplers used by the shader: if those lists did not change the previously created bind groups are still valid.

## Monitoring the performances
Performance of the cache can be assessed by looking at these properties (the property should be prefixed by `BABYLON.WebGPUCacheBindGroups.`):

| property | description |
| ---------| ----------- |
| NumBindGroupsCreatedTotal | Total of bind groups created since the start of the program |
| NumBindGroupsCreatedLastFrame | Number of bind groups created during the last frame - for best cache usage this value should be 0 on average |
| NumBindGroupsLookupLastFrame | Number of bind groups retrieved by traversing the cache |
| NumBindGroupsNoLookupLastFrame | Number of bind groups retrieved without traversing the cache because no changes in buffers/textures/samplers occurred since the last cache query for this shader |