Name

    NV_shader_buffer_load

Name Strings

    GL_NV_shader_buffer_load

Contact

    Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)

Contributors

    Pat Brown, NVIDIA
    Chris Dodd, NVIDIA
    Mark Kilgard, NVIDIA
    Eric Werness, NVIDIA

Status

    Complete

Version

    Last Modified Date:         August 8, 2010
    Author Revision:            8

Number

    379

Dependencies

    Written against the OpenGL 3.0 Specification.

    Written against the GLSL 1.30 Specification (Revision 09).

    This extension interacts with NV_gpu_program4. 


Overview

    At a very coarse level, GL has evolved in a way that allows 
    applications to replace many of the original state machine variables 
    with blocks of user-defined data. For example, the current vertex 
    state has been augmented by vertex buffer objects, fixed-function 
    shading state and parameters have been replaced by shaders/programs 
    and constant buffers, etc.. Applications switch between coarse sets 
    of state by binding objects to the context or to other container 
    objects (e.g. vertex array objects) instead of manipulating state 
    variables of the context. In terms of the number of GL commands 
    required to draw an object, modern applications are orders of 
    magnitude more efficient than legacy applications, but this explosion 
    of objects bound to other objects has led to a new bottleneck - 
    pointer chasing and CPU L2 cache misses in the driver, and general 
    L2 cache pollution.

    This extension provides a mechanism to read from a flat, 64-bit GPU 
    address space from programs/shaders, to query GPU addresses of buffer
    objects at the API level, and to bind buffer objects to the context in
    such a way that they can be accessed via their GPU addresses in any 
    shader stage. 
    
    The intent is that applications can avoid re-binding buffer objects 
    or updating constants between each Draw call and instead simply use 
    a VertexAttrib (or TexCoord, or InstanceID, or...) to "point" to the 
    new object's state. In this way, one of the cheapest "state" updates 
    (from the CPU's point of view) can be used to effect a significant 
    state change in the shader similarly to how a pointer change may on 
    the CPU. At the same time, this relieves the limits on how many 
    buffer objects can be accessed at once by shaders, and allows these 
    buffer object accesses to be exposed as C-style pointer dereferences
    in the shading language.

    As a very simple example, imagine packing a group of similar objects' 
    constants into a single buffer object and pointing your program
    at object <i> by setting "glVertexAttribI1iEXT(attrLoc, i);"
    and using a shader as such:

        struct MyObjectType {
            mat4x4 modelView;
            vec4 materialPropertyX;
            // etc.
        };
        uniform MyObjectType *allObjects;
        in int objectID; // bound to attrLoc
        
        ...

        mat4x4 thisObjectsMatrix = allObjects[objectID].modelView;
        // do transform, shading, etc.

    This is beneficial in much the same way that texture arrays allow 
    choosing between similar, but independent, texture maps with a single
    coordinate identifying which slice of the texture to use. It also
    resembles instancing, where a lightweight change (incrementing the 
    instance ID) can be used to generate a different and interesting 
    result, but with additional flexibility over instancing because the 
    values are app-controlled and not a single incrementing counter.
    
    Dependent pointer fetches are allowed, so more complex scene graph 
    structures can be built into buffer objects providing significant new 
    flexibility in the use of shaders. Another simple example, showing 
    something you can't do with existing functionality, is to do dependent
    fetches into many buffer objects:

        GenBuffers(N, dataBuffers);
        GenBuffers(1, &pointerBuffer);

        GLuint64EXT gpuAddrs[N];
        for (i = 0; i < N; ++i) {
            BindBuffer(target, dataBuffers[i]);
            BufferData(target, size[i], myData[i], STATIC_DRAW);
            
            // get the address of this buffer and make it resident.
            GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS, 
                                      gpuaddrs[i]); 
            MakeBufferResidentNV(target, READ_ONLY);
        }

        GLuint64EXT pointerBufferAddr;
        BindBuffer(target, pointerBuffer);
        BufferData(target, sizeof(GLuint64EXT)*N, gpuAddrs, STATIC_DRAW);
        GetBufferParameterui64vNV(target, BUFFER_GPU_ADDRESS, 
                                  &pointerBufferAddr); 
        MakeBufferResidentNV(target, READ_ONLY);

        // now in the shader, we can use a double indirection
        vec4 **ptrToBuffers = pointerBufferAddr;
        vec4 *ptrToBufferI = ptrToBuffers[i];

    This allows simultaneous access to more buffers than 
    EXT_bindable_uniform (MAX_VERTEX_BINDABLE_UNIFORMS, etc.) and each
    can be larger than MAX_BINDABLE_UNIFORM_SIZE.

New Procedures and Functions

    void MakeBufferResidentNV(enum target, enum access);
    void MakeBufferNonResidentNV(enum target);
    boolean IsBufferResidentNV(enum target);
    void MakeNamedBufferResidentNV(uint buffer, enum access);
    void MakeNamedBufferNonResidentNV(uint buffer);
    boolean IsNamedBufferResidentNV(uint buffer);

    void GetBufferParameterui64vNV(enum target, enum pname, 
                                   uint64EXT *params);
    void GetNamedBufferParameterui64vNV(uint buffer, enum pname, 
                                        uint64EXT *params);

    void GetIntegerui64vNV(enum value, uint64EXT *result);

    void Uniformui64NV(int location, uint64EXT value);
    void Uniformui64vNV(int location, sizei count,
                               const uint64EXT *value);
    void GetUniformui64vNV(uint program, int location, uint64EXT *params);
    void ProgramUniformui64NV(uint program, int location, uint64EXT value);
    void ProgramUniformui64vNV(uint program, int location, sizei count, 
                               const uint64EXT *value);

New Tokens

    Accepted by the <pname> parameter of GetBufferParameterui64vNV,
    GetNamedBufferParameterui64vNV:

        BUFFER_GPU_ADDRESS_NV                          0x8F1D

    Returned by the <type> parameter of GetActiveUniform:
    
        GPU_ADDRESS_NV                                 0x8F34

    Accepted by the <value> parameter of GetIntegerui64vNV: 

        MAX_SHADER_BUFFER_ADDRESS_NV                   0x8F35


Additions to Chapter 2 of the OpenGL 3.0 Specification (OpenGL Operation)

    Append to Section 2.9 (p. 45)

    The data store of a buffer object may be made accessible to the GL
    via shader buffer loads by calling:

        void MakeBufferResidentNV(enum target, enum access);

    <access> may only be READ_ONLY, but is provided for future 
    extensibility to indicate to the driver that the GPU may write to the
    memory. <target> may be any of the buffer targets accepted by 
    BindBuffer.  The error INVALID_OPERATION will be generated if no
    buffer is bound to <target>, if the buffer bound to <target> is
    already resident in the current GL context, or if the buffer bound to
    <target> has no data store.

    While the buffer object is resident, it is legal to use GPU addresses 
    in the range [BUFFER_GPU_ADDRESS, BUFFER_GPU_ADDRESS + BUFFER_SIZE) 
    in any shader stage.

    The data store of a buffer object may be made inaccessible to the GL
    via shader buffer loads by calling:
    
        void MakeBufferNonResidentNV(enum target);

    A buffer is also made non-resident implicitly as a result of being
    respecified via BufferData or being deleted. <target> may be any of 
    the buffer targets accepted by BindBuffer.  The error 
    INVALID_OPERATION will be generated if no buffer is bound to <target>
    or if the buffer bound to <target> is not resident in the current
    GL context.

    The function:

        void GetBufferParameterui64vNV(enum target, enum pname, 
                                       uint64EXT *params);

    may be used to query the GPU address of a buffer object's data store. 
    This address remains valid until the buffer object is deleted, or 
    when the data store is respecified via BufferData. The address "zero"
    is reserved for convenience, so no buffer object will ever have an 
    address of zero.  The error INVALID_OPERATION will be generated if no
    buffer is bound to <target>, or if the buffer bound to <target> has no
    data store.

    The functions:

        void MakeNamedBufferResidentNV(uint buffer, enum access);
        void MakeNamedBufferNonResidentNV(uint buffer);
        void GetNamedBufferParameterui64vNV(uint buffer, enum pname, 
                                            uint64EXT *params);
   
    operate identically to the non-"Named" functions except, rather than 
    using currently bound buffers, it uses the buffer object identified 
    by <buffer>.  If the buffer object named by the buffer parameter has
    not been previously bound or has been deleted since the last binding,
    the GL first creates a new state vector, initialized with a zero-sized
    memory buffer and comprising the state values listed in table 2.6.
    There is no buffer corresponding to the name zero, these commands
    generate the INVALID_OPERATION error if the buffer parameter is zero.

    Add to Section 2.20.3 (p. 98)

        void Uniformui64NV(int location, uint64EXT value);
        void Uniformui64vNV(int location, sizei count, uint64EXT *value);

    The Uniformui64{v}NV commands will load <count> uint64EXT values into 
    a uniform location defined as a GPU_ADDRESS_NV or an array of 
    GPU_ADDRESS_NVs.

    The functions:

        void ProgramUniformui64NV(uint program, int location, 
                                  uint64EXT value);
        void ProgramUniformui64vNV(uint program, int location, sizei count, 
                                   uint64EXT *value);
   
    operate identically to the non-"Program" functions except, rather 
    than updating the currently in use program object, these "Program" 
    commands update the program object named by the initial program 
    parameter.


    Insert a new subsection after Section 2.20.4, Shader Execution (Vertex
    Shaders), p. 103.

    Section 2.20.X, Shader Memory Access

    Shaders may load from buffer object memory by dereferencing pointer
    variables.  Pointer variables are 64-bit unsigned integer values referring
    to the GPU addresses of data stored in buffer objects made resident by
    MakeBufferResidentNV.  The GPU addresses of such buffer objects may be
    queried using GetBufferParameterui64vNV with a <pname> of
    BUFFER_GPU_ADDRESS_NV.

    When a shader dereferences a pointer variable, data are read from buffer
    object memory according to the following rules:

    - Data of type "bool" are stored in memory as one uint-typed value at the
      specified GPU address.  All non-zero values correspond to true, and zero
      corresponds to false.

    - Data of type "int" are stored in memory as one int-typed value at the
      specified GPU address.

    - Data of type "uint" are stored in memory as one uint-typed value at the
      specified GPU address.
 
    - Data of type "float" are stored in memory as one float-typed value at
      the specified GPU address.

    - Vectors with <N> elements with any of the above basic element types are
      stored in memory as <N> values in consecutive memory locations beginning
      at the specified GPU address, with components stored in order with the
      first (X) component at the lowest offset.  The data type used for
      individual components is derived according to the rules for scalar
      members above.

    - Data with any pointer type are stored in memory as a single 64-bit
      unsigned integer value at the specified GPU address.

    - Column-major matrices with <C> columns and <R> rows (using the type
      "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of
      <C> floating-point column vectors, each consisting of <R> components.
      The column vectors will be stored in order, with column zero at the
      lowest offset.  The difference in offsets between consecutive columns of
      the matrix will be referred to as the column stride, and is constant
      across the matrix.

    - Row-major matrices with <C> columns and <R> rows (using the type
      "mat<C>x<R>", or simply "mat<C>" if <C>==<R>) are treated as an array of
      <R> floating-point row vectors, each consisting of <C> components. The
      row vectors will be stored in order, with row zero at the lowest offset.
      The difference in offsets between consecutive rows of the matrix will be
      referred to as the row stride, and is constant across the matrix.
 
    - Arrays of scalars, vectors, pointers, and matrices are stored in memory
      by element order, with array member zero at the lowest offset.  The
      difference in offsets between each pair of elements in the array in
      basic machine units is referred to as the array stride, and is constant
      across the entire array.

    For matrix and array variables, the matrix and/or array strides
    corresponding to the variable may be derived according to the structure
    layout rules specified immediately below.

    When dereferencing a pointer to a structure, its individual members will
    be laid out in memory in monotonically increasing order based on their
    location in the structure declaration.  Each structure member has a base
    offset and a base alignment, from which an aligned offset is computed by
    rounding the base offset up to the next multiple of the base alignment.
    The base offset of the first member of a structure is taken from the
    aligned offset of the structure itself.  The base offset of all other
    structure members is derived by taking the offset of the last basic
    machine unit consumed by the previous member and adding one.  Each
    structure member is stored in memory at its aligned offset.

      (1) If the member is a scalar consuming <N> basic machine units, the
          base alignment is <N>.

      (2) If the member is a two- or four-component vector with components
          consuming <N> basic machine units, the base alignment is 2<N> or
          4<N>, respectively.

      (3) If the member is a three-component vector with components consuming
          <N> basic machine units, the base alignment is 4<N>.

      (4) If the member is an array of scalars or vectors, the base alignment
          and array stride are set to match the base alignment of a single
          array element, according to rules (1), (2), and (3). The array may
          have padding at the end; the base offset of the member following the
          array is rounded up to the next multiple of the base alignment.

      (5) If the member is a column-major matrix with <C> columns and <R>
          rows, the matrix is stored identically to an array of <C> column
          vectors with <R> components each, according to rule (4).

      (6) If the member is an array of <S> column-major matrices with <C>
          columns and <R> rows, the matrix is stored identically to a row of
          <S>*<C> column vectors with <R> components each, according to rule
          (4).

      (7) If the member is a row-major matrix with <C> columns and <R> rows,
          the matrix is stored identically to an array of <R> row vectors
          with <C> components each, according to rule (4).

      (8) If the member is an array of <S> row-major matrices with <C> columns
          and <R> rows, the matrix is stored identically to a row of <S>*<R>
          row vectors with <C> components each, according to rule (4).

      (9) If the member is a structure, the base alignment of the structure is
          <N>, where <N> is the largest base alignment value of any of its
          members.  The individual members of this sub-structure are then
          assigned offsets by applying this set of rules recursively, where
          the base offset of the first member of the sub-structure is equal to
          the aligned offset of the structure. The structure may have padding
          at the end; the base offset of the member following the
          sub-structure is rounded up to the next multiple of the base
          alignment of the structure.

      (10) If the member is an array of <S> structures, the <S> elements of
           the array are laid out in order, according to rule (9).

    If a shader reads from a GPU address that does not correspond to a buffer
    object made resident by MakeBufferResidentNV, the results of the operation
    are undefined and may result in application termination.

    Any variable, array element, or structure member accessed using a pointer
    has a required base alignment, which may be derived according the
    structure layout rules above.  If a variable, array member, or structure
    member is accessed using a pointer that is not a multiple of its base
    alignment, the results of the access will be undefined.  To store multiple
    variables in a single buffer object, an application must ensure that each
    variable is properly aligned.  Storing a single scalar, vector, matrix,
    array, or structure variable using a pointer set to the base GPU address
    of a resident buffer object requires no special alignment.  The base GPU
    address of a buffer object is guaranteed to be sufficiently aligned to
    satisfy the base alignment requirement of any variable, and the layout
    rules above ensure that individual matrix rows/columns, array elements,
    and structure members are properly aligned as long as the base pointer
    meets alignment requirements.


Additions to Chapter 5 of the OpenGL 3.0 Specification (Special Functions)

    Add to Section 5.4, p. 310 (Display Lists)

    Edit the list of commands that are executed immediately when compiling
    a display list to include MakeBufferResidentNV, 
    MakeBufferNonResidentNV, MakeNamedBufferResidentNV, 
    MakeNamedBufferNonResidentNV, GetBufferParameterui64vNV, 
    GetNamedBufferParameterui64vNV, IsBufferResidentNV, and
    IsNamedBufferResidentNV.

Additions to Chapter 6 of the OpenGL 3.0 Specification (Querying GL State)

    Add to Section 6.1.11, p. 314 (Pointer, String, and 64-bit Queries)

    The command:
        
        void GetIntegerui64vNV(enum value, uint64EXT *result);

    obtains 64-bit unsigned integer state variables. Legal values of 
    <value> are only those that specify GetIntegerui64vNV in the state
    tables in Chapter 6.

    Add to Section 6.1.13, p. 332 (Buffer Object Queries)

    The commands:

        boolean IsBufferResidentNV(enum target);
        boolean IsNamedBufferResidentNV(uint buffer);

    return TRUE if the specified buffer is resident in the current context.
    The error INVALID_OPERATION will be generated by IsBufferResidentNV if no
    buffer is bound to <target>.  If the buffer object named by the buffer
    parameter of IsNamedBufferResidentNV has not been previously bound or has
    been deleted since the last binding, the GL first creates a new state
    vector, initialized with a zero-sized memory buffer and comprising the
    state values listed in table 2.6.  There is no buffer corresponding to the
    name zero, IsNamedBufferResidentNV generates the INVALID_OPERATION error if
    the buffer parameter is zero.

    Add to Section 6.1.15, p. 337 (Shader and Program Queries)

        void GetUniformui64vNV(uint program, int location, uint64EXT *params);

Additions to Appendix D of the OpenGL 3.0 Specification (Shared Objects and Multiple Contexts)

    Add a new section D.X (Object Use by GPU Address)

    A buffer object's GPU addresses is valid in all contexts in the share
    group that the buffer belongs to. A buffer should be made resident in
    each context that will use it via GPU address, to allow the GL 
    knowledge that it is used in each command stream.

Additions to the NV_gpu_program4 specification:

    Change Section 2.X.2, Program Grammar

    If a program specifies the NV_shader_buffer_load program option, 
    the following modifications apply to the program grammar:

    Append to <opModifier> list: | "F32" | "F32X2" | "F32X4" | "S8" | "S16" | 
    "S32" | "S32X2" | "S32X4" | "U8" | "U16" | "U32" | "U32X2" | "U32X4".

    Append to <SCALARop> list: | "LOAD".

    Modify Section 2.X.4, Program Execution Environment

    (Add to the set of opcodes in Table X.13)

                  Modifiers 
      Instruction F I C S H D  Out Inputs    Description
      ----------- - - - - - -  --- --------  --------------------------------
      LOAD        X X X X - F  v   su        Global load


    (Add to Table X.14, Instruction Modifiers, and to the corresponding
    description following the table)

      Modifier  Description
      --------  -----------------------------------------------
      F32       Access one 32-bit floating-point value
      F32X2     Access two 32-bit floating-point values
      F32X4     Access four 32-bit floating-point values
      S8        Access one 8-bit signed integer value
      S16       Access one 16-bit signed integer value
      S32       Access one 32-bit signed integer value
      S32X2     Access two 32-bit signed integer values
      S32X4     Access four 32-bit signed integer values
      U8        Access one 8-bit unsigned integer value
      U16       Access one 16-bit unsigned integer value
      U32       Access one 32-bit unsigned integer value
      U32X2     Access two 32-bit unsigned integer values
      U32X4     Access four 32-bit unsigned integer values

    For memory load operations, the "F32", "F32X2", "F32X4", "S8", "S16",
    "S32", "S32X2", "S32X4", "U8", "U16", "U32", "U32X2", and "U32X4" storage
    modifiers control how data are loaded from memory.  Storage modifiers are
    supported by LOAD instruction and are covered in more detail in the
    descriptions of that instruction.  LOAD must specify exactly one of these
    modifiers, and may not specify any of the base data type modifiers (F,U,S)
    described above.  The base data type of the result vector of a LOAD
    instruction is trivially derived from the storage modifier.


    Add New Section 2.X.4.5, Program Memory Access

    Programs may load from buffer object memory via the LOAD (global load)
    instruction.

    Load instructions read 8, 16, 32, 64, or 128 bits of data from a source
    address to produce a four-component vector, according to the storage
    modifier specified with the instruction.  The storage modifier has three
    parts:

      - a base data type, "F", "S", or "U", specifying that the instruction
        fetches floating-point, signed integer, or unsigned integer values,
        respectively;

      - a component size, specifying that the components fetched by the
        instruction have 8, 16, or 32 bits; and

      - an optional component count, where "X2" and "X4" indicate that two or
        four components be fetched, and no count indicates a single component
        fetch.

    When the storage modifier specifies that fewer than four components should
    be fetched, remaining components are filled with zeroes.  When performing
    a global load (LOAD), the GPU address is specified as an instruction
    operand.  Given a GPU address <address> and a storage modifier <modifier>,
    the memory load can be described by the following code:

      result_t_vec BufferMemoryLoad(char *address, OpModifier modifier)
      {
        result_t_vec result = { 0, 0, 0, 0 };
        switch (modifier) {
        case F32:
            result.x = ((float32_t *)address)[0];
            break;
        case F32X2:
            result.x = ((float32_t *)address)[0];
            result.y = ((float32_t *)address)[1];
            break;
        case F32X4:
            result.x = ((float32_t *)address)[0];
            result.y = ((float32_t *)address)[1];
            result.z = ((float32_t *)address)[2];
            result.w = ((float32_t *)address)[3];
            break;
        case S8:
            result.x = ((int8_t *)address)[0];
            break;
        case S16:
            result.x = ((int16_t *)address)[0];
            break;
        case S32:
            result.x = ((int32_t *)address)[0];
            break;
        case S32X2:
            result.x = ((int32_t *)address)[0];
            result.y = ((int32_t *)address)[1];
            break;
        case S32X4:
            result.x = ((int32_t *)address)[0];
            result.y = ((int32_t *)address)[1];
            result.z = ((int32_t *)address)[2];
            result.w = ((int32_t *)address)[3];
            break;
        case U8:
            result.x = ((uint8_t *)address)[0];
            break;
        case U16:
            result.x = ((uint16_t *)address)[0];
            break;
        case U32:
            result.x = ((uint32_t *)address)[0];
            break;
        case U32X2:
            result.x = ((uint32_t *)address)[0];
            result.y = ((uint32_t *)address)[1];
            break;
        case U32X4:
            result.x = ((uint32_t *)address)[0];
            result.y = ((uint32_t *)address)[1];
            result.z = ((uint32_t *)address)[2];
            result.w = ((uint32_t *)address)[3];
            break;
        }
        return result;
      }

    If a global load accesses a memory address that does not correspond to a
    buffer object made resident by MakeBufferResidentNV, the results of the
    operation are undefined and may result in application termination.

    The address used for the buffer memory loads must be aligned to the fetch
    size corresponding to the storage opcode modifier.  For S8 and U8, the
    offset has no alignment requirements.  For S16 and U16, the offset must be
    a multiple of two basic machine units.  For F32, S32, and U32, the offset
    must be a multiple of four.  For F32X2, S32X2, and U32X2, the offset must
    be a multiple of eight.  For F32X4, S32X4, and U32X4, the offset must be a
    multiple of sixteen.  If an offset is not correctly aligned, the values
    returned by a buffer memory load will be undefined.


    Modify Section 2.X.6, Program Options

    + Shader Buffer Load Support (NV_shader_buffer_load)

    If a program specifies the "NV_shader_buffer_load" option, it may use the
    LOAD instruction to load data from a resident buffer object given a GPU
    address.


    Section 2.X.8.Z, LOAD:  Global Load

    The LOAD instruction generates a result vector by reading an address from
    the single unsigned integer scalar operand and fetching data from buffer
    object memory, as described in Section 2.X.4.5.

      address = ScalarLoad(op0);
      result = BufferMemoryLoad(address, storageModifier);

    LOAD supports no base data type modifiers, but requires exactly one
    storage modifier.  The base data type of the result vector is derived from
    the storage modifier.  The single scalar operand is always interpreted as
    an unsigned integer.

    The range of GPU addresses supported by the LOAD instruction may be
    subject to an implementation-dependent limit.  If any component fetched by
    the LOAD instruction corresponds to memory with an address larger than the
    value of MAX_SHADER_BUFFER_ADDRESS_NV, the value fetched for that
    component will be undefined.


Modifications to The OpenGL Shading Language Specification, Version 1.30.09

    Modify Section 3.6, Keywords, p. 14

    (add the following to the list of reserved keywords)

    intptr_t 
    uintptr_t


    Modify Section 4.1, Basic Types, p. 18

    (add to the basic "Transparent Types" table, p. 18)

      Types       Meaning
      --------    ----------------------------------------------------------
      intptr_t    a signed integer with the same precision as a pointer
      uintptr_t   an unsigned integer with the same precision as a pointer

    (replace the last paragraph of the section with the following)

    Pointers to any of the transparent types, user-defined structs, or other
    pointer types are supported.


    Modify Section 4.1.3, Integers, p. 18

    (add to the end of the first paragraph) Signed and unsigned integer
    variables are fully supported.  ... intptr_t and uintptr_t variables have
    the same number of bits of precision as the native size of a pointer in
    the underlying implementation.


    (Insert new section immediately before Section 4.1.10, Implicit
    Conversions, p. 27)

    Section 4.1.X, Pointers

    Pointers are 64-bit unsigned integer values that represent the address of
    some "global" memory (i.e. not local to this invocation of a shader).
    Pointers to any of the transparent types, user-defined structures, or
    pointer types are supported.  Pointers are dereferenced with the operators
    (*), (->), and ([]) and a variety of operators performing addition and
    subtraction are supported.  There is no mechanism to assign a pointer to
    the address of a local variable or array, nor is there a mechanism to
    allocate or free memory from within a shader.  There are no function
    pointers.

    The underlying memory read using pointer variables may also be accessed
    using the OpenGL API commands.  To communicate between shaders and other
    OpenGL API commands, variables read through pointers are arranged in
    memory in the manner described in Section 2.20.X of the OpenGL
    Specification.


    Modify Section 4.1.10, Implicit Conversions, p. 27

    (add before the final paragraph of the section, p. 27) 

    Pointers to any type may be implicitly converted to pointers to void.
    Pointers to any type (including void), are never implicitly converted to
    pointers to any other non-void type.


    Modify Section 5.1, Operators, p. 39

    (add new entries to the precedence table; for a full spec, renumber the
    new precedence row "3.5" to "4", and renumber all subsequent rows)

    Precedence  Operator Class               Operators    Associativity
    ----------  --------------------------   ---------    -------------
      2         field access from pointer       ->        left to right
      3         pointer dereference             *         right to left
      3.5       typecast                        ()        right to left    

    (modify the last paragraph, p.39, to delete language saying that
     dereferences and typecast operators are not supported)  

    There is no address-of operator.


    (Insert new section immediately after Section 5.7, Structure and Array
     Operations, p. 46)

    Section 5.X, Pointer Operations

    The following operators are allowed to operate on pointer types:

        pointer dereference                     *
        additive                                + -
        array subscript                         []
        arithmetic assignments                  += -=
        postfix increment and decrement         ++ --
        prefix increment and decrement          ++ --
        equality                                == !=
        assignment                              =
        field or method selector                ->

    The pointer dereference operator is a unary operator that converts a
    pointer expression into an l-value designating data of the type pointed to
    by the pointer expression.  The result of a pointer dereference may not be
    used as the left-hand side of an assignment.

    The pointer binary addition (+) and subtraction (-) operators produce a
    pointer result from one pointer operand and one scalar signed or unsigned
    integer operand.  For subtraction, the pointer must be the first operand;
    for addition, the pointer may be either operand.  The type of the result
    is the same type as the pointer operand.  A new pointer is computed by
    adding or subtracting <I>*<S> basic machine units to the value of the
    pointer operand, where <I> is the integer operand and <S> is the stride
    that would be derived by applying the rules specified in Section 2.20.X of
    the OpenGL Specification to an array with elements of the type pointed to
    by the pointer.

    The binary subtraction (-) operator may also operate on a pair of pointers
    of identical type.  In this operation, the second operand is subtracted
    from the first, yielding a signed integer result of type <intptr_t>.  The
    result is in units of the type being pointed to.  The result is the
    integer value that would yield the first pointer operand if added to the
    second pointer operand in the manner described above.  If no such integer
    value exists, the result of the operation is undefined.  Pointer
    subtraction is not supported for pointers to the type <void>.

    The array subscript operator ([]) adds a signed or unsigned integer
    expression specified inside the brackets to a pointer expression specified
    to the left of the brackets, and then dereferences the pointer produced by
    the addition.  The array subscript operation "P[i]" is functionally
    equivalent to "(*(P+i))".

    The add into (+=) and subtract from (-=) are binary operations, where the
    first operand must be one that could be assigned to (an l-value) and the
    second operand must be a signed or unsigned integer scalar.  These
    operations add the integer operand into or subtract the integer operand
    from the pointer operand, as defined for pointer addition and subtraction.

    The arithmetic unary operators post- and pre-increment and decrement (--
    and ++) operate on pointers.  For post- and pre-increment and decrement,
    the expression must be one that could be assigned to (an l-value).  Pre-
    and post-increment and decrement add or subtract 1 to the contents of the
    expression they operate on, as defined for pointer addition and
    subtraction.  The value of the pre-increment or pre-decrement expression
    is the resulting value of that modification.  The value of the
    post-increment or post-decrement expression is the value of the expression
    before modification.

    The equality operators equal (==) and not equal (!=) operate on pointer
    types and produce a scalar Boolean result.  The two operands must either
    be pointers to the same type, or one of the two operands must point to
    void.  Two pointers are considered equal if and only if they point to the
    same global memory address.

    The field or method selection operator (->) operates on a pointer to a
    structure of any type and is used to select a field of the structure
    pointed to by the pointer.  This selector also operates on a pointer to
    vector of any type, where the right hand side of the operator must be a
    valid string using the vector component selection suffix described in
    Section 5.5.  In both cases, the field or method selection operation
    "p->s" is functionally equivalent to "((*p).s)".

    Pointer addition and subtraction, including the add into, subtract from,
    and pre- and post-increment and decrement operators, are not supported on
    pointers to a void type.

    The assignment operator may be used to update the value of a pointer
    variable, as described in Section 5.8.


    (Insert after Section 5.10, Vector and Matrix Operations, p. 50)

    Section 5.11, Typecast Operations

    The typecast operator may be used to convert an expression from one type
    to another, operating in a manner similar to scalar, vector, and matrix
    constructors.  The typecast operator specifies a new data type in
    parentheses, followed by an expression, as in the following examples:

      float a = (float) 2U;
      vec3 b = (vec3) 1.0;
      vec4 c = (vec4) b;
      mat2 d = (mat2) 1.0;
      mat4 e = (mat4) d;

    For scalar, vector, and matrix data types, the set of typecasts supported
    is equivalent to the set of single-operand constructors supported, and a
    typecast operates identically to an equivalent constructor.  A scalar
    expression may be typecast to any scalar, vector, or matrix data type.  A
    vector expression may be typecast any vector type, except vectors with a
    larger number of components.  Additionally, four-component vector
    expressions may also be cast to a mat2 type.  A matrix expression may be
    typecast to any other matrix data type.

    Expressions with structure type may only be typecast to a structure of
    identical type, which has no effect.  Typecast operators are not supported
    for array types.

    Note that the typecast operator takes only a single expression.  Unlike
    constructors, they can not be used to generate a vector, structure, or
    matrix from multiple inputs.  For example,

      vec3 f = (vec3) (1.0, 2.0, 3.0);

    generates a three-component vector <f>.  But all three components
    are set to 3.0, which is the scalar value of the expression "(1.0, 2.0,
    3.0)".  The commas in that expression are sequence operators, not list
    delimiters.

    Additionally, typecast operators may also be used to cast values to a
    pointer type.  In this case, the expression being typecast must be either
    a pointer (to any type) or a scalar of type intptr_t or uintptr_t.

      vec4      *v4ptr
      intptr_t  iptr;
      vec3      *v3ptr = (vec3 *) v4ptr;
      ivec2     *iv2ptr = (ivec2 *) iptr;

    Note that function call-style constructors are not supported for pointers.


    Add to the end of Section 8.3, Common Functions, p. 72

    (add support for pointer packing functions)

    Syntax:

      void *packPtr(uvec2 a);
      uvec2 unpackPtr(void *a);

    The function packPtr() returns a pointer to void by constructing a 64-bit
    void pointer from the two 32-bit components of an unsigned integer vector.
    The first vector component specifies the 32 least significant bits of the
    pointer; the second component specifies the 32 most significant bits.

    The function unpackPtr() returns a two-component unsigned integer vector
    built from a 64-bit void pointer.  The first component of the vector
    consists of the 32 least significant bits of the pointer value; the second
    component consists of the 32 most significant bits.


    Modify Chapter 9, Shading Language Grammar, p.92

    (change comment in the grammar disallowing pointer dereferences)

    Change the sentence:

      // Grammar Note: No '*' or '&' unary ops. Pointers are not supported.

    to

      // Grammar Note: No '&' unary.


Additions to the AGL/EGL/GLX/WGL Specifications

    None

Errors

    INVALID_ENUM is generated by MakeBufferResidentNV if <access> is not
    READ_ONLY.
    
    INVALID_ENUM is generated by GetBufferParameterui64vNV if <pname> is
    not BUFFER_GPU_ADDRESS_NV.

    INVALID_OPERATION is generated by MakeBufferResidentNV,
    MakeBufferNonResidentNV, IsBufferResidentNV, and GetBufferParameterui64vNV
    if no buffer is bound to <target>.

    INVALID_OPERATION is generated by MakeBufferResidentNV if the buffer bound
    to <target> is already resident in the current GL context.

    INVALID_OPERATION is generated by MakeBufferNonResidentNV if the buffer
    bound to <target> is not resident in the current GL context.

    INVALID_OPERATION is generated by MakeNamedBufferResidentNV if <buffer> is
    already resident in the current GL context.

    INVALID_OPERATION is generated by MakeNamedBufferNonResidentNV if <buffer>
    is not resident in the current GL context.

    INVALID_OPERATION is generated by GetBufferParameterui64vNV or
    MakeBufferResidentNV if the buffer bound to <target> has no data store.

    INVALID_OPERATION is generated by GetNamedBufferParameterui64vNV or
    MakeNamedBufferResidentNV if <buffer> has no data store.

Examples

    (1) Layout of a complex structure using the rules from the new Section
        2.20.X added to the OpenGL spec:

    struct  Example {
                    // bytes used            rules
      float a;      //  0-3                  
      vec2 b;       //  8-15                 1   // bumped to a multiple of 8
      vec3 c;       //  16-27                1
      struct {
        int d;      //  32-35                2   // bumped to a multiple of 8 (bvec2)
        bvec2 e;    //  40-47                1
      } f;
      float g;      //  48-51                
      float h[2];   //  52-55 (h[0])         5   // multiple of 4 (float) with no additional padding
                    //  56-59 (h[1])         6   // tightly packed
      mat2x3 i;     //  64-75 (i[0])         
                    //  80-91 (i[1])         6   // bumped to a multiple of 16 (vec3)
      struct {
        uvec3 j;    //   96-107 (m[0].j)     
        vec2 k;     //  112-119 (m[0].k)     1   // bumped to a multiple of 8 (vec2)
        float l[2]; //  120-123 (m[0].l[0])  1,5 // simply float aligned
                    //  124-127 (m[0].l[1])  6   // tightly packed
                    //  128-139 (m[1].j)
                    //  144-151 (m[1].k)
                    //  152-155 (m[1].l[0])
                    //  156-159 (m[1].l[1])
      } m[2];
    };
    // sizeof(Example) == 160

    (2) Replacing bindable_uniform with an array of pointers:

        #version 120
        #extension GL_NV_shader_buffer_load : require
        #extension GL_EXT_bindable_uniform : require

        in vec4 **ptr;
        in uvec2 whichbuf;

        void main() {
            gl_FrontColor = ptr[whichbuf.x][whichbuf.y];
            gl_Position = ftransform();
        }

        in the GL code, assuming the bufferobject setup in the Overview:

        glBindAttribLocation(program, 8, "ptr");    
        glBindAttribLocation(program, 9, "whichbuf");    
        glLinkProgram(program);
        glBegin(...);
        glVertexAttribI2iEXT(8, (unsigned int)pointerBufferAddr, 
                                (unsigned int)(pointerBufferAddr>>32));
        for (i = ...) {
            for (j = ...) {
                glVertexAttribI2iEXT(9, i, j);
                glVertex3f(...);
            }
        }
        glEnd();


New State

    Update Table 6.11, p. 349 (Buffer Object State)

    Get Value                   Type    Get Command                  Initial Value   Sec     Attribute
    ---------                   ----    -----------                  -------------   ---     ---------
    BUFFER_GPU_ADDRESS_NV       Z64+    GetBufferParameterui64vNV    0               2.9     none

    Update Table 6.46, p. 384 (Implementation Dependent Values)

    Get Value                   Type    Get Command                  Minimum Value   Sec     Attribute
    ---------                   ----    -----------                  -------------   ---     ---------
    MAX_SHADER_BUFFER_ADDRESS_NV Z64+   GetIntegerui64vNV            0xFFFFFFFF      2.X.2   none

Dependencies on NV_gpu_program4:

    This extension is generally written against the NV_gpu_program4 
    wording, program grammar, etc., but doesn't have specific 
    dependencies on its functionality. 

    
Issues

    1) Only buffer objects?

    RESOLVED: YES, for now. Buffer objects are unformatted memory and 
    easily mapped to a "pointer"-style shading language. 

    2) Should we allow writes?

    RESOLVED: NO, deferred to a later extension. Writes involve 
    specifying many kinds of synchronization primitives. Writes are also
    a "side effect" which makes program execution "observable" in cases
    where it may not have otherwise been (e.g. early-Z can kill fragments
    before shading, or a post-transform cache may prevent vertex program
    execution).
    
    3) What happens if an invalid pointer is fetched?

    UNRESOLVED: Unpredictable results, including program termination?
    Make the driver trap the error and report it (still unpredictable
    results, but no program termination)? My preference would be to 
    at least report the faulting address (roughly), whether it was 
    a read or a write, and which shader stage faulted. I'd like to not 
    terminate the program, but the app has to assume all their data 
    stored in the GL is lost.

    4) What should this extension be named?

    RESOLVED: NV_shader_buffer_load. Rather than trying to choose an
    overly-general name and naming future extensions "GL_XXX2", let's 
    name this according to the specific functionality it provides.

    5) What are the performance characteristics of buffer loads?

    RESOLVED: Likely somewhere between uniforms and texture fetches, 
    but totally implementation-dependent. Uniforms still serve a purpose
    for "program locals". Buffer loads may have different caching 
    behavior than either uniforms or texture fetches, but the expectation
    is that they will be cached reads of memory and all the common sense
    guidelines to try to maintain locality of reference apply.

    6) What does MakeBufferResidentNV do? Why not just have a 
    MapBufferGPUNV?

    RESOLVED: Reserving virtual address space only requires knowing the 
    size of the data store, so an explicit MapBufferGPU call isn't 
    necessary. If all GPUs supported demand paging, a GPU address might
    be sufficient, but without that assumption MakeBufferResidentNV serves
    as a hint to the driver that it needs to page lock memory, download 
    the buffer contents into GPU-accessible memory, or other similar 
    preparation. MapBufferGPU would also imply that a different address
    may be returned each time it is mapped, which could be cumbersome
    for the application to handle.

    7) Is it an error to render while any resident buffer is mapped?
    
    RESOLVED: No. As the number of attachment points in the context grows,
    even the existing error check is falling out of favor.

    8) Does MapBuffer stall on pending use of a resident buffer?

    RESOLVED: No. The existing language is:
    
        "If the GL is able to map the buffer object's data store into the 
         client's address space, MapBuffer returns the pointer value to 
         the data store once all pending operations on that buffer have
         completed."

    However, since the implementation has no information about how the 
    buffer is used, "all pending operations" amounts to a Finish. In 
    terms of sharing across contexts/threads, ARB_vertex_buffer_object 
    says:

        "How is synchronization enforced when buffer objects are shared by
         multiple OpenGL contexts?

         RESOLVED: It is generally the clients' responsibility to
         synchronize modifications made to shared buffer objects."

    So we shouldn't dictate any additional shared object synchronization.
    So the best we could do is a Finish, but it's not clear that this 
    accomplishes anything for the application since they can just as 
    easily call Finish. Or if they don't want synchronization, they can 
    use MAP_UNSYNCHRONIZED_BIT. It seems the resolution to this is 
    inconsequential as GL already provides the tools to achieve either 
    behavior. Hence, don't bother stalling.

    However, if a buffer was previously resident and has since been made 
    non-resident, the implementation should enforce the stalling 
    behavior for those pending operations from before it was made non-
    resident.

    9) Given issue (8), what are some effective ways to load data into 
    a buffer that is resident?

    RESOLVED: There are several possibilities:

    - BufferSubData.
    
    - The application may track using Fences which parts of the buffer 
      are actually in use and update them with CPU writes using 
      MAP_UNSYNCHRONIZED_BIT. This is potentially error-prone, as 
      described in ARB_copy_buffer.

    - CopyBufferSubData. ARB_copy_buffer describes a simple usage example
      for a single-threaded application. Since this extension is targeted
      at reducing the CPU bottleneck in the rendering thread, offloading
      some of the work to other threads may be useful.

      Example with a single Loading thread and Rendering thread:

          Loading thread:
              while (1) {
                  WaitForEvent(something to do);

                  NamedBufferData(tempBuffer, updateSize, NULL, STREAM_DRAW);
                  ptr = MapNamedBuffer(tempBuffer, WRITE_ONLY);
                  // fill ptr
                  UnmapNamedBuffer(tempBuffer);
                  // the buffer could have been filled via BufferData, if 
                  // that's more natural.
                  
                  // send tempBuffer name to Rendering thread
              }
          Rendering thread:
              foreach (obj in scene) {
                  if (obj has changed) {
                      // get tempBuffer name from Loading thread
                      
                      NamedCopyBufferSubData(tempBuffer, objBuf, objOffset, updateSize);
                  }
                  Draw(obj);
              }

      If we further desire to offload the data transfer to another 
      thread, and the implementation supports concurrent data transfers 
      in one context/thread while rendering in another context/thread, 
      this may also be accomplished thusly:

          Loading thread:
              while (1) {
                  WaitForEvent(something to do);

                  NamedBufferData(sysBuffer, updateSize, NULL, STREAM_DRAW);
                  ptr = MapNamedBuffer(sysBuffer, WRITE_ONLY);
                  // fill ptr
                  UnmapNamedBuffer(sysBuffer);
                  
                  NamedBufferData(vidBuffer, updateSize, NULL, STREAM_COPY);
                  // This is a sysmem->vidmem blit.
                  NamedCopyBufferSubData(sysBuffer, vidBuffer, 0, updateSize);
                  SetFence(fenceId, ALL_COMPLETED);

                  // send vidBuffer name and fenceId to Rendering thread

                  // This could have been a BufferSubData directly into 
                  // vidBuffer, if that's more natural.
              }
          Rendering thread:
              foreach (obj in scene) {
                  if (obj has changed) {
                      // get vidBuffer name and fenceId from Loading thread
                      
                      // note: there aren't any sharable fences currently,
                      // actually need to ask the loading thread when it
                      // has finished.
                      FinishFence(fenceId);
                      
                      // This is hopefully a fast vidmem->vidmem blit.
                      NamedCopyBufferSubData(vidBuffer, objBuffer, objOffset, updateSize);
                  }
                  Draw(obj);
              }

      In both of these examples, the point at which the data is written to 
      the resident buffer's data store is clearly specified in order
      with rendering commands. This resolves a whole class of 
      synchronization bugs (Write After Read hazard) that 
      MAP_UNSYNCHRONIZED_BIT is prone to.

    10) What happens if BufferData is called on a buffer that is resident? 
    
    RESOLVED: BufferData is specified to "delete the existing data store", 
    so the GPU address of that data should become invalid. The buffer is
    therefore made non-resident in the current context.

    11) Should residency be a property of the buffer object, or should
    a buffer be "made resident to a context"?

    RESOLVED: Made resident to a context. If a shared buffer is used in 
    two threads/contexts, it may be difficult for the application to know 
    when the residency state actually changes on the shared object 
    particularly if there is a large latency between commands being 
    submitted on the client and processed on the server. Allowing the 
    buffer to be made resident to each context individually allows the 
    state to be reliably toggled in-order in each command stream. This 
    also allows MakeBufferNonResident to serve as indication to the GL
    that the buffer is no longer in use in each command stream.

    This leads to an unfortunate orphaning issue. For example, if the 
    buffer is resident in context A and then deleted in context B, how 
    can the app make it non-resident in context A? Given the name-based 
    object model, it is impossible. It would be complex from an 
    implementation point of view for DeleteBuffers (or BufferData) to 
    either make it non-resident or throw an error if it is resident in 
    some other context. 
    
    An ideal solution would be a (separate) extension that allows the 
    application to increment the refcount on the object and to decrement
    the refcount without necessarily deleting the object's name. Until 
    such an extension exists, the unsatisfying proposed resolution is that
    a buffer can be "stuck" resident until the context is deleted. Note 
    that DeleteBuffers should make the buffer non-resident in the context 
    that does the delete, so this problem only applies to rare multi-
    context corner cases.

    12) Is there any value in requiring an "immutable structure" bit of 
    state to be set in order to query the address? 
    
    RESOLVED: NO. Given that the BufferData behavior is fairly 
    straightforward to specify and implement, it's not clear that this 
    would be useful.

    13) What should the program syntax look like?

    RESOLVED: Support 1-, 2-, 4-vec fetches of float/int/uint types, as 
    well as 8- and 16-bit int/uint fetches via a new LOAD instruction 
    with a slew of suffixes. Handling 8/16bit sizes will be useful for 
    high-level languages compiling to the assembly. Addresses are required
    to be a multiple of the size of the data, as some implementations may 
    require this.

    Other options include a more x86-style pointer dereference 
    ("MOV R0, DWORD PTR[R1];") or a complement to program.local
    ("MOV R0, program.global[R1];") but neither of these provide the
    simple granularity of the explicit type suffixes, and a new 
    instruction is convenient in terms of implementation and not muddling 
    the clean definition of MOV.

    14) How does the GL know to invalidate caches when data has changed?

    RESOLVED: Any entry points that can write to buffer objects should 
    trigger the necessary invalidation. A new entry point may only be 
    necessary once there is a way to write to a buffer by GPU address.

    15) Does this extension require 64bit register/operation support in 
        programs and shaders?

    RESOLVED: NO. At the API level, GPU addresses are always 64bit values
    and when they are stored in uniforms, attribs, parameters, etc. they
    should always be stored at full precision. However, if programs and 
    shaders don't support 64bit registers/operations via another 
    programmability extension, then they will need to use only 32 bits.
    On such implementations, the usable address space is therefore limited
    to 4GB. Such a limit should be reflected in the value of 
    MAX_SHADER_BUFFER_ADDRESS_NV.

    It is expected that GLSL shaders will be compiled in such a way as to 
    generate 64bit pointers on implementations that support it and 32bit
    pointers on implementations that don't. So GLSL shaders written against
    a 32bit implementation can be expected to be forward-compatible when 
    run against a 64bit implementation. (u)intptr_t types are provided to 
    ease this compatibility.
    
    Built-in functions are provided to convert pointers to and from a pair
    of integers. These can be used to pass pointers as two components of a
    generic attrib, to construct a pointer from an RGUI32 texture fetch, 
    or to write a pointer to a fragment shader output.

    16) What assumption can applications make about the alignment of 
    addresses returned by GetBufferParameterui64vNV?

    RESOLVED: All buffers will begin at an address that is a multiple of 
    16 bytes.

    17) How can the application guarantee that the layout of a structure
        on the CPU matches the layout used by the GLSL compiler?

    RESOLVED: Provide a standard set of packing rules designed around 
    naturally aligning simple types. This spec will define pointer fetches
    in GLSL to use these rules, but does not explicitly guarantee that 
    other extensions (like EXT_bindable_uniform) will use the same packing
    rules for their bufferobject fetches. These packing rules are 
    different from the ARB_uniform_buffer_object rules - in particular, 
    these rules do not require vec4 padding of the array stride.

    18) Is the address space per-context, per-share-group, or global?

    RESOLVED: It is per-share-group. Using addresses from one share group
    in another share group will cause undefined results.

    19) Is there risk of using invalid pointers for "killed" fragments, 
    fragments that don't take a certain branch of an "if" block, or 
    fragments whose shader is conceptually never executed due to pixel 
    ownership, stipple, etc.?

    RESOLVED: NO. OpenGL implementations sometimes run fragment programs 
    on "helper" pixels that have no coverage, or continue to run fragment
    programs on killed pixels in order to be able to compute sane partial
    derivatives for fragment program instructions (DDX, DDY) or automatic
    level-of-detail calculations for texturing.  In this approach,
    derivatives are approximated by computing the difference in a quantity
    computed for a given fragment at (x,y) and a fragment at a neighboring
    pixel.  When a fragment program is executed on a "helper" pixel or 
    killed pixel, global loads may not be executed in order to prevent 
    spurious faults. Helper pixels aren't explicitly mentioned in the spec 
    body; instead, partial derivatives are obtained by magic.

    If a fragment program contains a KIL instruction, compilers may not
    reorder code such that a LOAD instruction is executed before a KIL
    instruction that logically precedes it in flow control.  Once a 
    fragment is killed, subsequent loads should never be executed if they
    could cause any observable side effects.

    As a result, if a shader uses instructions that explicitly or 
    implicitly do LOD calculations dependent on the result of a global 
    load, those instructions will have undefined results.

    20) How are structures and arrays stored in buffer object memory?

    RESOLVED:  Individual structure members and array elements are stored
    "packed" in memory, subject to an alignment requirement.  Structure
    members are stored according to the order of declaration.  Array elements
    are stored consecutively by element number.  Unreferenced structure
    members or array elements are never eliminated.  

    The alignment requirement of individual structure members or array
    elements is usually equal to the size of the item.  For the purposes of
    this requirement, vector types are treated atomically (i.e., a "vec4" with
    32-bit floats will be 16-byte aligned).  One exception is that the
    required alignment of three-component vectors is the same as the required
    alignment of a four-component vector of the same base type.

    21) How do the memory layout rules relate to the similar layout rules
    specified for the uniform buffer object (UBO) feature incorporated in
    OpenGL 3.1?

    RESOLVED:  This extension was completed prior to OpenGL 3.1, but the
    layout rules for this extension and for UBO were developed roughly
    concurrently.  The layout rules here are nearly identical to those for the
    "std140" layout for uniform blocks.  The main difference here is that
    "std140" requires arrays of small types (e.g., "float") to be padded out
    to vec4 alignment (16B), while this extension does not.

    Note that this extension does NOT allow shaders to use the layout()
    qualifier added by GLSL 1.40 to achieve fine-grained control of structure
    or array layout using pointers.  A subsequent extension could provide this
    capability.

    22) Should we provide a mechanism for tighter packing of an array of
    three-component vectors?

    RESOLVED:  This could be desirable, but it won't be provided in this
    extension.  A subsequent extension could support alternate layouts by
    allowing shaders to use of the GLSL 1.40 layout() modifier to qualify
    pointer types.  

    If tight packing of vec3's is strongly required, a three component array
    element could be constructed using three single component loads or by
    selecting/swizzling components of one or more larger loads.  The former
    technique could be done using GLSL by replacing:

      vec3 *pointer;
      vec3 elementN;
      int n;
      elementN = pointer[n];

    with

      float *pointer;
      vec3 elementN;
      int n;
      elementN = vec3(pointer[n*3], pointer[n*3+1], pointer[n*3+2]);


Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  -----------------------------------------
     8    08/06/10  istewart  Modify behavior of named buffer functions
                              to match those of EXT_direct_state_access.
                              Add INVALID_OPERATION error to 
                              MakeBufferResidentNV and GetBufferParameterui64vNV
                              if the buffer object has no data store.

     7    06/22/10  pbrown    Document INVALID_OPERATION errors on 
                              residency managment and query APIs when an
                              non-existent buffer object is referenced, 
                              when trying to make an already resident buffer
                              resident, or when trying to make an already
                              non-resident buffer non-resident.

     6    09/21/09  groth     Fix non-conformant DSA function names.

     5    09/10/09  Jon Leech Add 'const' to type of Uniformui64vNV and
                              ProgramUniformui64vNV 'count' argument.

     4    09/09/09  mjk       Fix typos

     3    08/21/09  pbrown    Add explicit spec language describing the
                              typecast operator implemented here.  The
                              previous spec language said it was allowed
                              but didn't say what it did.

     2    08/05/09  pbrown    Update section describing memory layout of
                              variables pointed to; moved to the core
                              specification as with OpenGL 3.1's uniform
                              buffer layout.  Added a few issues on memory
                              layout.  Explicitly documented the set of
                              operations and implicit conversions allowed 
                              on pointers.

     1              jbolz     Internal revisions.
