Previous Table of Contents Next


13.10.5 Relevant OSFM Registry Interfaces


   13.10.5.1 Character and Code Set Registry

   The OSF character and code set registry is defined in OSF Character and Code Set Registry (see References in the Preface) and current registry contents may be obtained directly from the Open Software Foundation (obtain via anonymous ftp to ftp.opengroup.org:/pub/code_set_registry). This registry contains two parts: character sets and code sets. For each listed code set, the set of character sets encoded by this code set is shown.

   Each 32-bit code set value consists of a high-order 16-bit organization number and a 16-bit identification of the code set within that organization. As the numbering of organizations starts with 0x0001, a code set null value (0x00000000) may be used to indicate an unknown code set.

   When associating character sets and code sets, OSF uses the concept of “fuzzy equality,? meaning that a code set is shown as encoding a particular character set if the code set can encode “most? of the characters.

   “Compatibility? is determined with respect to two code sets by examining their entries in the registry, paying special attention to the character sets encoded by each code set. For each of the two code sets, an attempt is made to see if there is at least one (fuzzydefined) character set in common, and if such a character set is found, then the assumption is made that these code sets are “compatible.? Obviously, applications which exploit parts of a character set not properly encoded in this scheme will suffer information loss when communicating with another application in this “fuzzy? scheme.

   The ORB is responsible for accessing the OSF registry and determining “compatibility? based on the information returned.

   OSF members and other organizations can request additions to both the character set and code set registries by email to cs-registry@opengroup.org; in particular, one range of the code set registry (0xf5000000 through 0xffffffff) is reserved for organizations to use in identifying sets which are not registered with the OSF (although such use would not facilitate interoperability without registration).

   13.10.5.2 Access Routines

   The following routines are for accessing the OSF character and code set registry. These routines map a code set string name to code set id and vice versa. They also help in determining character set compatibility. These routine interfaces, their semantics and their actual implementation are not normative (i.e., ORB vendors do not have to bundle the OSF registry implementation with their products for compliance).

   The following routines are adopted from RPC Runtime Support For I18N Characters Functional Specification (see References in the Preface).

   dce_cs_loc_to_rgy

   Maps a local system-specific string name for a code set to a numeric code set value specified in the code set registry.

   Synopsis

   void dce_cs_loc_to_rgy(idl_char *local_code_set_name,unsigned32 *rgy_code_set_value,unsigned16 *rgy_char_sets_number,unsigned16 **rgy_char_sets_value, error_status_t *status);

   Parameters

   Input

   local_code_set_name -A string that specifies the name that the local host’s locale environment uses to refer to the code set. The string is a maximum of 32 bytes: 31 data bytes plus a terminating NULL character.

   Output

   rgy_code_set_value 0 - The registered integer value that uniquely identifies the code set specified by local_code_set_name.

   rgy_char_sets_number - The number of character sets that the specified code set encodes. Specifying NULL prevents this routine from returning this parameter.

   rgy_char_sets_value - A pointer to an array of registered integer values that uniquely identify the character set(s) that the specified code set encodes. Specifying NULL prevents this routine from returning this parameter. The routine dynamically allocates this value.

   status - Returns the status code from this routine. This status code indicates whether the routine completed successfully or, if not, why not.

   The possible status codes and their meanings are as follows:

   Description

   The dce_cs_loc_to_rgy() routine maps operating system-specific names for character/code set encodings to their unique identifiers in the code set registry.

   The dce_cs_loc_to_rgy() routine takes as input a string that holds the host-specific “local name? of a code set and returns the corresponding integer value that uniquely identifies that code set, as registered in the host's code set registry. If the integer value does not exist in the registry, the routine returns the status dce_cs_c_unknown.

   The routine also returns the number of character sets that the code set encodes and the registered integer values that uniquely identify those character sets. Specifying NULL in the rgy_char_sets_number and rgy_char_sets_value[] parameters prevents the routine from performing the additional search for these values. Applications that want only to obtain a code set value from the code set registry can specify NULL for these parameters in order to improve the routine's performance. If the value is returned from the routine, application developers should free the array after it is used, since the array is dynamically allocated.

   dce_cs_rgy_to_loc

   Maps a numeric code set value contained in the code set registry to the local system-specific name for a code set.

   Synopsis

   void dce_cs_rgy_to_loc( unsigned32 *rgy_code_set_value, idl_char **local_code_set_name, unsigned16 *rgy_char_sets_number,

    unsigned16 **rgy_char_sets_value, error_status_t *status);

   Parameters

   Input

   rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set.

   Output

   local_code_set_name - A string that specifies the name that the local host's locale environment uses to refer to the code set. The string is a maximum of 32 bytes: 31 data bytes and a terminating NULL character.

   rgy_char_sets_number - The number of character sets that the specified code set encodes. Specifying NULL in this parameter prevents the routine from returning this value.

   rgy_char_sets_value - A pointer to an array of registered integer values that uniquely identify the character set(s) that the specified code set encodes. Specifying NULL in this parameter prevents the routine from returning this value. The routine dynamically allocates this value.

   status - Returns the status code from this routine. This status code indicates whether the routine completed successfully or, if not, why not.

   The possible status codes and their meanings are as follows:

   Description

   The dce_cs_rgy_to_loc() routine maps a unique identifier for a code set in the code set registry to the operating system-specific string name for the code set, if it exists in the code set registry.

   The dce_cs_rgy_to_loc() routine takes as input a registered integer value of a code set and returns a string that holds the operating system-specific, or local name, of the code set.

   If the code set identifier does not exist in the registry, the routine returns the status dce_cs_c_unknown and returns an undefined string.

   The routine also returns the number of character sets that the code set encodes and the registered integer values that uniquely identify those character sets. Specifying NULL in the rgy_char_sets_number and rgy_char_sets_value[] parameters prevents the routine from performing the additional search for these values. Applications that want only to obtain a local code set name from the code set registry can specify NULL for these parameters in order to improve the routine's performance. If the value is returned from the routine, application developers should free the rgy_char_sets_value array after it is used.

   rpc_cs_char_set_compat_check

   Evaluates character set compatibility between a client and a server.

   Synopsis

   void rpc_cs_char_set_compat_check( unsigned32 client_rgy_code_set_value, unsigned32 server_rgy_code_set_value, error_status_t *status);

   Parameters

   Input

   client_rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set that the client is using as its local code set.

   server_rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set that the server is using as its local code set.

   Output

   status - Returns the status code from this routine. This status code indicates whether the routine completed successfully or, if not, why not.

   The possible status codes and their meanings are as follows:

   Description

   The rpc_cs_char_set_compat_check() routine provides a method for determining character set compatibility between a client and a server; if the server's character set is incompatible with that of the client, then connecting to that server is most likely not acceptable, since massive data loss would result from such a connection.

   The routine takes the registered integer values that represent the code sets that the client and server are currently using and calls the code set registry to obtain the registered values that represent the character set(s) that the specified code sets support. If both client and server support just one character set, the routine compares client and server registered character set values to determine whether or not the sets are compatible. If they are not, the routine returns the status message rpc_s_ss_no_compat_charsets.

   If the client and server support multiple character sets, the routine determines whether at least two of the sets are compatible. If two or more sets match, the routine considers the character sets compatible, and returns a success status code to the caller.

   rpc_rgy_get_max_bytes

   Gets the maximum number of bytes that a code set uses to encode one character from the code set registry on a host

   Synopsis

   void rpc_rgy_get_max_bytes(unsigned32 rgy_code_set_value,unsigned16 *rgy_max_bytes,error_status_t *status);

   Parameters

   Input

   rgy_code_set_value - The registered hexadecimal value that uniquely identifies the code set.

   Output

   rgy_max_bytes - The registered decimal value that indicates the number of bytes this code set uses to encode one character.

   status - Returns the status code from this routine. This status code indicates whether the routine completed successfully or, if not, why not.

   The possible status codes and their meanings are as follows:

   Description

   The rpc_rgy_get_max_bytes() routine reads the code set registry on the local host. It takes the specified registered code set value, uses it as an index into the registry, and returns the decimal value that indicates the number of bytes that the code set uses to encode one character.

   This information can be used for buffer sizing as part of the procedure to determine whether additional storage needs to be allocated for conversion between local and network code sets.