Previous Table of Contents Next


15.3.4 Value Types


   Value types are built from OMG IDL’s value type definitions. Their representation and encoding is defined in this section.

   Value types may be used to transmit and encode complex state. The general approach is to support the transmission of the data (state) and type information encoded as RepositoryIDs.

   The loading (and possible transmission) of code is outside of the scope of the GIOP definition, but enough information is carried to support it, via the CodeBase object.

   The format makes a provision for the support of custom marshaling (i.e., the encoding and transmission of state using application-defined code). Consistency between custom encoders and decoders is not ensured by the protocol.

   The encoding supports all of the features of value types as well as supporting the “chunking? of value types. It does so in a compact way.

   At a high level the format can be described as the linearization of a graph. The graph is the depth-first exploration of the transitive closure that starts at the top-level value object and follows its “reference to value objects? fields (an ordinary remote reference is just written as an IOR). It is a recursive encoding similar to the one used for TypeCodes. An indirection is used to point to a value that has already been encoded.

   The data members are written beginning with the highest possible base type to the most derived type in the order of their declaration.

   2. Accordingly, in cases where encapsulated data holds data with natural alignment of greater than four octets, some processors may need to copy the octet data before removing it from the encapsulation. For example, an appropriate way to deal with long long discriminator type in an encapsulation for a union TypeCode is to encode the body of the encapsulation as if it was aligned at the 8 byte boundary, and then copy the encoded value into the encapsulation. This may result in long long data values inside the encapsulation being aligned on only a 4 byte boundary when viewed from outside the encapsulation.

   15.3.4.1 Partial Type Information and Versioning

   The format provides support for partial type information and versioning issues in the receiving context. However the encoding has been designed so that this information is only required when “advanced features? such as truncation are used.

   The presence (or absence) of type information and codebase URL information is indicated by flags within the <value_tag>, which is a long in the range between 0x7fffff00 and 0x7fffffff inclusive. The last octet of this tag is interpreted as follows:

   When a list of RepositoryIDs is present, the encoding is a long specifying the number of RepositoryIDs, followed by the RepositoryIDs. The first RepositoryID is the id for the most derived type of the value. If this type has any base types, the sending context is responsible for listing the RepositoryIDs for all the base types to which it is safe to truncate the value passed. These truncatable base types are listed in order, going up the derivation hierarchy. The sending context may choose to (but need not) terminate the list at any point after it has sent a RepositoryID for a type well-known to the receiving context. A well-known type is any of the following:

   For value types that have an RMI: RepositoryId, ORBs must include at least the most derived RepositoryId, in the value type encoding.

    For value types marshaled as abstract interfaces (see Section 15.3.7, “Abstract Interfaces,? on page 15-30), RepositoryId information must be included in the value type encoding.

   If the receiving context needs more typing information than is contained in a GIOP message that contains a codebase URL information, it can go back to the sending context and perform a lookup based on that RepositoryID to retrieve more typing information (e.g., the type graph).

   CORBA RepositoryIDs may contain standard version identification (major and minor version numbers or a hash code information). The ORB run time may use this information to check whether the version of the value being transmitted is compatible with the version expected. In the event of a version mismatch, the ORB may apply product-specific truncation/conversion rules (with the help of a local interface repository or the SendingContext::RunTime service). For example, the Java serialization model of truncation/conversion across versions can be supported. See the JDK 1.1 documentation for a detailed specification of this model.

   15.3.4.2 Example

   The following examples demonstrate legal combinations of truncatability, actual parameter types and GIOP encodings. This is not intended to be an exhaustive list of legal possibilities.

   The following example uses valuetypes animal and horse, where horse is derived from animal. The actual parameters passed to the specified operations are an_animal of runtime type animal and a_horse of runtime type horse.

   The following combinations of truncatability, actual parameter types and GIOP encodings are legal.

   1. If there is a single operation:

    op1(in animal a);

   a) If the type horse cannot be truncated to animal (i.e., horse is declared):

    valuetype horse: animal ...

   then the encoding is as shown below:

   Actual Invocation Legal Encoding

   op1(a_horse) 2 horse 6 1 horse

   Note that if the type horse is not available to the receiver, then the receiver throws a demarshaling exception. b). If the type horse can be truncated to animal (i.e., horse is declared): valuetype horse: truncatable animal ...

   then the encoding is as shown below

   Actual Invocation Legal Encoding

    op1(a_horse) 6 2 horse animal

   Note that if the type horse is not available to the receiver, then the receiver tries to truncate to animal.

   c) Regardless of the truncation relationships, when the exact type of the formal argument is sent, then the encoding is as shown below:

   Actual Invocation Legal Encoding

    op1(an_animal) 0 2 animal 6 1 animal

   2. Given the additional operation:

    op2(in horse h); (i.e., the sender knows that both types horse and animal and their derivation relationship are known to the receiver) a). If the type horse cannot be truncated to animal (i.e., horse is declared):

   valuetype horse: animal ...

   then the encoding is as shown below:

   Actual Invocation Legal Encoding

    op2(a_horse) 2 horse 6 1 horse

   Note that the demarshaling exception of case 1 will not occur, since horse is available to the receiver.

    b). If the type horse can be truncated to animal (i.e., horse is declared):

    valuetype horse: truncatable animal ...

   then the encoding is as shown below:

   Actual Invocation Legal Encoding

   op2 (a_horse) 2 horse

   6 1 horse

   6 2 horse animal

   Note that truncation will not occur, since horse is available to the receiver.

   15.3.4.3 Scope of the Indirections

   The special value 0xffffffff introduces an indirection (i.e., it directs the decoder to go somewhere else in the marshaling buffer to find what it is looking for). This can be codebase URL information that has already been encoded, a RepositoryID that has already been encoded, a list of repository IDs that has already been encoded, or another value object that is shared in a graph. 0xffffffff is always followed by a long indicating where to go in the buffer. A repositoryID or URL, which is the target of an indirection used for encoding a valuetype must have been introduced as the type or codebase information for a valuetype.

   It is not permissible for a repositoryID marshalled for some purpose other than as the type information of a valuetype to use indirection to reference a previously marshaled value. The encoding used to indicate an indirection is the same as that used for recursive TypeCodes (i.e., a 0xffffffff indirection marker followed by a long offset (in units of octets) from the beginning of the long offset). As an example, this means that an offset of negative four (-4) is illegal, because it is self-indirecting to its indirection marker. Indirections may refer to any preceding location in the GIOP message, including previous fragments if fragmentation is used. This includes any previously marshaled parameters. Non-negative offsets are reserved for future use. Indirections may not cross encapsulation boundaries.

   Fragmentation support in GIOP versions 1.1, 1.2, and 1.3 introduces the possibility of a header for a FragmentMessage being marshaled between the target of an indirection and the start of the encapsulation containing the indirection. The octets occupied by any such headers are not included in the calculation of the offset value.

   15.3.4.4 Null Values

   All value types have a distinguished “null.? All null values are encoded by the <null_tag> (0x0). The CDR encoding of null values includes no type information.

   15.3.4.5 Other Encoding Information

   A “new? value is coded as a value header followed by the value’s state. The header contains a tag and codebase URL information if appropriate, followed by the RepositoryID and an octet flag of bits. Because the same RepositoryID (and codebase URL information) could be repeated many times in a single request when sending a complex graph, they are encoded as a regular string the first time they appear, and use an indirection for later occurrences.

   15.3.4.6 Fragmentation

   It is anticipated that value types may be rather large, particularly when a graph is being transmitted. Hence the encoding supports the breaking up of the serialization into an arbitrary number of chunks in order to facilitate incremental processing.

   Values with truncatable base types need a length indication in case the receiver needs to truncate them to a base type. Value types that are custom marshaled also need a length indication so that the ORB run time can know exactly where they end in the stream without relying on user-defined code. This allows the ORB to maintain consistency and ensure the integrity of the GIOP stream when the user-written custom marshaling and demarshaling does not marshal the entire value state. For simplicity of encoding, we use a length indication for all values whether or not they have a truncatable base type or use custom marshaling.

   If limited space is available for marshaling, it may be necessary for the ORB to send the contents of a marshaling buffer containing a partially marshaled value as a GIOP fragment. At that point in the marshaling, the length of the entire value being marshaled may not be known. Calculating this length may require processing as costly as marshaling the entire value. It is therefore desirable to allow the value to be encoded as multiple chunks, each with its own length. This allows the portion of a value that occupies a marshaling buffer to be sent as a chunk of known length with no need for additional length calculation processing.

   The data may be split into multiple chunks at arbitrary points except within primitive CDR types, arrays of primitive types, strings, and wstrings, or between the tag and offset of indirections. It is never necessary to end a chunk within one of these types as the length of these types is known before starting to marshal them so they can be added to the length of the currently open chunk. It is the responsibility of the CDR stream to hide the chunking from the marshaling code.

   The presence (or absence) of chunking is indicated by flags within the <value_tag>. The fourth least significant bit (<value_tag> & 0x00000008) is the value 1 if a chunked encoding is used for the value’s state. The chunked encoding is required for custom marshaling and truncation. If this bit is 0, the state is encoded as <octets>.

   Each chunk is preceded by a positive long, which specifies the number of octets in the chunk.

   A chunked value is terminated by an end tag that is a non-positive long so the start of the next value can be differentiated from the start of another chunk. In the case of values that contain other values (e.g., a linked list) the “nested? value is started without there being an end tag. The absolute value of an end tag (when it finally appears) indicates the nesting level of the value being terminated. A single end tag can be used to terminate multiple nested values. The detailed rules are as follows:

   for future use (e.g., supporting a nesting depth of more than 2^31). The outermost value type will always be terminated by an end tag with a value of -1. Enclosing non-chunked valuetypes are not considered when determining the nesting depth.

   The following example describes how end tags may be used. Consider a valuetype declaration that contains two member values:

   // IDLvaluetype simpleNode{ ... };valuetype node truncatable simpleNode {public node node1;public node node2;

   };

   When an instance of type ‘node’ is passed as a parameter of type ‘simpleNode’ a chunked encoding is used. In all cases, the outermost value is terminated with an end tag with a value of -1. The nested value ‘node1’ is terminated with an end tag with a value of -2 since only the second-level value ‘node1’ ends at that point. Since the nested value ‘node2’ coterminates with the outermost value, either of the following end tag layouts is legal:

   Because data members are encoded in their declaration order, declaring a value type data member of a value type last is likely to result in more compact encoding on the wire because it maximizes the number of values ending at the same place and so allows a single end tag to be used for multiple values. The canonical example for that is a linked list.

   Truncating a value type in the receiving context may require keeping track of unused nested values (only during unmarshaling) in case further indirection tags point back to them. These values can be held in their “raw? GIOP form, as fully unmarshaled value objects, or in any other product-specific form.

   Value types that are custom marshaled are encoded as chunks in order to let the ORB run-time know exactly where they end in the stream without relying on user-defined code.

   15.3.4.7 Notation

   The on-the-wire format is described by a BNF grammar with conventions similar to the ones used to define IDL syntax. The terminals of the grammar are to be interpreted differently. We are describing a protocol format. Although the terminals have the same names as IDL tokens they represent either:

   For example, long is a shorthand for the GIOP encoding of the IDL long data type with all the GIOP alignment rules. Similarly struct is a shorthand for the GIOP CDR encoding of a struct.

   A (type) constant means that an instance of the given type having the given value is encoded according to the rules for that type. So that (long) 0 means that a CDR encoding for a long having the value 0 appears at that location.

   15.3.4.8 The Format

   The concatenated octets of consecutive value chunks within a value encode state members for the value according to the following grammar:

(1) <state members> ::= <state_member>
| <state_member> <state members>
(2) <state_member> ::= <value_ref>
// All legal IDL types should be here
| octet
| boolean
| char
| short
| unsigned short
| long
| unsigned long
| float
| wchar
| wstring
| string
| struct
| union
| sequence
| array
| Object
| any
| long long
| unsigned long long
| double
| long double
| fixed