このページの英語版を見る

std.utf

UTF-8、UTF-16、UTF-32文字列をエンコードおよびデコードする。

UTF文字のサポートは '\u0000' <= character <= '\U0010FFFF'.


カテゴリー	関数
デコード	decode decodeFront
遅延デコード	byCodeUnit byChar byWchar byDchar byUTF
エンコード	encode toUTF8 toUTF16 toUTF32 toUTFz toUTF16z
長さ	codeLength count stride strideBack
インデックス	toUCSindex toUTFindex
バリデーション	isValidDchar isValidCodepoint validate
その他	replacementDchar UseReplacementDchar UTFException

License:

Boost License 1.0.

Authors:

Walter Bright and Jonathan M Davis

ソース std/utf.d

class UTFException: core.exception.UnicodeException;

std.utf関数のエラー時にスローされる例外。

Examples:

import std.exception : assertThrown;

char[4] buf;
assertThrown!UTFException(encode(buf, cast(dchar) 0xD800));
assertThrown!UTFException(encode(buf, cast(dchar) 0xDBFF));
assertThrown!UTFException(encode(buf, cast(dchar) 0xDC00));
assertThrown!UTFException(encode(buf, cast(dchar) 0xDFFF));
assertThrown!UTFException(encode(buf, cast(dchar) 0x110000));

pure nothrow @nogc @safe this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null); pure nothrow @safe this(string msg, size_t index, string file = __FILE__, size_t line = __LINE__, Throwable next = null);: 標準的な例外コンストラクタ。
const string toString();: Returns:
無効な UTF シーケンスの詳細についてはstring を参照のこと。

pure nothrow @nogc @safe bool isValidDchar(dchar c);

与えられた Unicode コードポイントが有効かどうかをチェックする。

Parameters:

dchar c チェックするコードポイント

Returns:

true もし cが有効なUnicodeコード・ポイントである

注釈:」である。 '\uFFFE' および'\uFFFF' は、以下の場合に有効とみなされる。 isValidDchar, によって有効であるとみなされる。は有効であるとみなされる。

Examples:

assert( isValidDchar(cast(dchar) 0x41));
assert( isValidDchar(cast(dchar) 0x00));
assert(!isValidDchar(cast(dchar) 0xD800));
assert(!isValidDchar(cast(dchar) 0x11FFFF));

bool isValidCodepoint(Char)(Char c) if (isSomeChar!Char);

1文字が有効なコードポイントを形成しているかどうかをチェックする。

単独では無効なコードポイントになる文字もある。例えば wchar 0xD800 はいわゆるハイ・サロゲートであり、その後に続くロー・サロゲートと一緒にしか解釈できない。に続く低サロゲートと一緒にしか解釈できない。単独の文字としては単独の文字としては無効とみなされる。

詳細はUnicode StandardのD90, D91, D92を参照のこと。

Parameters:

Char `c`	テストする文字
Char	の文字型である。 `c`

Returns:

trueもし cが有効なコードポイントを形成する。

Examples:

assert( isValidCodepoint(cast(char) 0x40));
assert(!isValidCodepoint(cast(char) 0x80));
assert( isValidCodepoint(cast(wchar) 0x1234));
assert(!isValidCodepoint(cast(wchar) 0xD800));
assert( isValidCodepoint(cast(dchar) 0x0010FFFF));
assert(!isValidCodepoint(cast(dchar) 0x12345678));

uint stride(S)(auto ref S str, size_t index) if (is(S : const(char[])) || isRandomAccessRange!S && is(immutable(ElementType!S) == immutable(char))); uint stride(S)(auto ref S str) if (is(S : const(char[])) || isInputRange!S && is(immutable(ElementType!S) == immutable(char))); uint stride(S)(auto ref S str, size_t index) if (is(S : const(wchar[])) || isRandomAccessRange!S && is(immutable(ElementType!S) == immutable(wchar))); pure @safe uint stride(S)(auto ref S str) if (is(S : const(wchar[]))); uint stride(S)(auto ref S str) if (isInputRange!S && is(immutable(ElementType!S) == immutable(wchar)) && !is(S : const(wchar[]))); uint stride(S)(auto ref S str, size_t index = 0) if (is(S : const(dchar[])) || isInputRange!S && is(immutable(ElementEncodingType!S) == immutable(dchar)));

で始まるUTFシーケンスの長さを計算する。 index で始まる str.

Parameters:

S `str`	入力範囲 UTF コード単位の入力範囲。もし `index`でなければならない。
size_t `index`	UTF シーケンスの開始インデックス (デフォルト:0)

Returns:

UTFシーケンスのコードユニット数。UTF-8の場合、これはの間の値である(RFC 3629の第3節による)。 UTF-16では1か2である。UTF-32では常に1である。

Throws:

が開始点でない場合、UTFException 。 str[index]が有効なを投げるかもしれない。

注釈: UTF文字列は有効なUTF文字列である。 strideは最初の str[index]要素のみを分析する。それははUTFシーケンスの有効性を完全には検証しない。シーケンスの存在すら検証しない。 index + stride(str, index) <= str.length.

Examples:

writeln("a".stride); // 1
writeln("λ".stride); // 2
writeln("aλ".stride); // 1
writeln("aλ".stride(1)); // 2
writeln("𐐷".stride); // 4

uint strideBack(S)(auto ref S str, size_t index) if (is(S : const(char[])) || isRandomAccessRange!S && is(immutable(ElementType!S) == immutable(char))); uint strideBack(S)(auto ref S str) if (is(S : const(char[])) || isRandomAccessRange!S && hasLength!S && is(immutable(ElementType!S) == immutable(char))); uint strideBack(S)(auto ref S str) if (isBidirectionalRange!S && is(immutable(ElementType!S) == immutable(char)) && !isRandomAccessRange!S); uint strideBack(S)(auto ref S str, size_t index) if (is(S : const(wchar[])) || isRandomAccessRange!S && is(immutable(ElementType!S) == immutable(wchar))); uint strideBack(S)(auto ref S str) if (is(S : const(wchar[])) || isBidirectionalRange!S && is(immutable(ElementType!S) == immutable(wchar))); uint strideBack(S)(auto ref S str, size_t index) if (isRandomAccessRange!S && is(immutable(ElementEncodingType!S) == immutable(dchar))); uint strideBack(S)(auto ref S str) if (isBidirectionalRange!S && is(immutable(ElementEncodingType!S) == immutable(dchar)));

の1つ前のコードユニットで終わるUTFシーケンスの長さを計算する。 indexである。 str.

Parameters:

S `str`	UTFコード単位の双方向範囲。もし `index`が渡される場合は，ランダムアクセスでなければならない。
size_t `index`	が渡された場合は、ランダムアクセスでなければならない: `str`.length)

Returns:

UTFシーケンスのコードユニット数。UTF-8の場合、これはの間の値である(RFC 3629の第3節による)。 UTF-16では1か2である。UTF-32では常に1である。

Throws:

UTFException 。 str[index]が有効なを投げるかもしれない。

注釈: UTFシーケンスの末尾は、有効なUTFシーケンスの末尾である。 strideBackはstr[index - 1] 要素を解析するだけである。UTFシーケンスの有効性を完全に検証することはない。シーケンスの存在すら検証しない。 strideBack(str, index) <= index 実際には保証しない。

Examples:

writeln("a".strideBack); // 1
writeln("λ".strideBack); // 2
writeln("aλ".strideBack); // 2
writeln("aλ".strideBack(1)); // 1
writeln("𐐷".strideBack); // 4

pure @safe size_t toUCSindex(C)(const(C)[] str, size_t index) if (isSomeChar!C);

与えられた indexを strと仮定すると indexがであると仮定する、 toUCSindexまでのUCS文字の数を決定する。までの index.つまり indexをコード・ユニットのインデックスとする。で、戻り値はそのコードポイントが文字列の何番目のコードポイントかを表す。を返す。

Examples:

writeln(toUCSindex(`hello world`, 7)); // 7
writeln(toUCSindex(`hello world`w, 7)); // 7
writeln(toUCSindex(`hello world`d, 7)); // 7

writeln(toUCSindex(`Ma Chérie`, 7)); // 6
writeln(toUCSindex(`Ma Chérie`w, 7)); // 7
writeln(toUCSindex(`Ma Chérie`d, 7)); // 7

writeln(toUCSindex(`さいごの果実 / ミツバチと科学者`, 9)); // 3
writeln(toUCSindex(`さいごの果実 / ミツバチと科学者`w, 9)); // 9
writeln(toUCSindex(`さいごの果実 / ミツバチと科学者`d, 9)); // 9

pure @safe size_t toUTFindex(C)(const(C)[] str, size_t n) if (isSomeChar!C);

UCSインデックス nを返す。 strにUCSインデックスが与えられると、UTFインデックスを返す。つまり nは、そのコード・ポイントが文字列の何番目のコード・ポイントであるか、そしてには、そのコード単位の配列インデックスが返される。

Examples:

writeln(toUTFindex(`hello world`, 7)); // 7
writeln(toUTFindex(`hello world`w, 7)); // 7
writeln(toUTFindex(`hello world`d, 7)); // 7

writeln(toUTFindex(`Ma Chérie`, 6)); // 7
writeln(toUTFindex(`Ma Chérie`w, 7)); // 7
writeln(toUTFindex(`Ma Chérie`d, 7)); // 7

writeln(toUTFindex(`さいごの果実 / ミツバチと科学者`, 3)); // 9
writeln(toUTFindex(`さいごの果実 / ミツバチと科学者`w, 9)); // 9
writeln(toUTFindex(`さいごの果実 / ミツバチと科学者`d, 9)); // 9

alias UseReplacementDchar = std.typecons.Flag!"useReplacementDchar".Flag;

で無効なUTFを置き換えるかどうか。 replacementDchar

dchar decode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(auto ref S str, ref size_t index) if (!isSomeString!S && isRandomAccessRange!S && hasSlicing!S && hasLength!S && isSomeChar!(ElementType!S)); pure @trusted dchar decode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(auto ref scope S str, ref size_t index) if (isSomeString!S);

で始まるコードポイントをデコードして返す。 str[index]. index で始まるコードポイントは、デコードされたコードポイントの1つ前に進められる。コードポイントがでない場合は、UTFException がスローされる。 indexのままである。変更されない。

decodeは、長さとスライシングを持つ文字列とコードユニットのランダムアクセス範囲でのみ動作するに対してのみ動作する。 decodeFrontは任意の入力範囲のコード単位で動作する。

Parameters:

useReplacementDchar	無効なUTFの場合、"スロー"ではなく"replacementDchar"を返す。
S `str`	入力文字列またはインデックス可能な範囲
size_t `index`	s[]への開始インデックス。

Returns:

デコードされた文字

Throws:

UTFExceptionもし str[index]が有効なUTF であり、useReplacementDcharがNo.useReplacementDchar

Examples:

size_t i;

assert("a".decode(i) == 'a' && i == 1);
i = 0;
assert("å".decode(i) == 'å' && i == 2);
i = 1;
assert("aå".decode(i) == 'å' && i == 3);
i = 0;
assert("å"w.decode(i) == 'å' && i == 1);

// 多符号語としてのë
i = 0;
assert("e\u0308".decode(i) == 'e' && i == 1);
// 単一符号点書記素としてのë
i = 0;
assert("ë".decode(i) == 'ë' && i == 2);
i = 0;
assert("ë"w.decode(i) == 'ë' && i == 1);

dchar decodeFront(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref S str, out size_t numCodeUnits) if (!isSomeString!S && isInputRange!S && isSomeChar!(ElementType!S)); pure @trusted dchar decodeFront(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref scope S str, out size_t numCodeUnits) if (isSomeString!S); dchar decodeFront(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref S str) if (isInputRange!S && isSomeChar!(ElementType!S));

decodeFrontの変種である。 decodeの変種であり、特にの変種である。とは異なり decode, decodeFrontと違って入力範囲とは異なり、(文字列やランダムアクセスの範囲ではなく)コード単位の任意の入力範囲を受け付ける範囲ではない)。また、ref 。デコードする。もし numCodeUnitsが渡されると、デコードされたコード・ポイントにあったコード・ユニットの数がセットされる。に設定される。

Parameters:

useReplacementDchar	無効なUTFの場合、"スロー"ではなく"replacementDchar"を返す。
S `str`	入力文字列またはインデックス可能な範囲
size_t `numCodeUnits`	処理されたコードユニットの数に設定される

Returns:

デコードされた文字

Throws:

UTFExceptionもし str.frontが有効なUTFシーケンスの先頭でない場合シーケンスの先頭でない場合例外がスローされた場合、ポップオフされたコードユニットの数は保証されない。というのも、使用されている範囲の種類と、いくつのコード・ユニットがポップ・オフされなければならなかったかに依存するからである。使用されている範囲の型と、コードポイントが無効であると判断されるまでに何個のコードユニットがポップオフされなければならなかったかに依存するからである。そのコードポイントが無効であると判断されるまでに、何個のコードユニットがポップオフされたかに依存するからである。

Examples:

import std.range.primitives;
string str = "Hello, World!";

assert(str.decodeFront == 'H' && str == "ello, World!");
str = "å";
assert(str.decodeFront == 'å' && str.empty);
str = "å";
size_t i;
assert(str.decodeFront(i) == 'å' && i == 2 && str.empty);

dchar decodeBack(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref S str, out size_t numCodeUnits) if (isSomeString!S); dchar decodeBack(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref S str, out size_t numCodeUnits) if (!isSomeString!S && isSomeChar!(ElementType!S) && isBidirectionalRange!S && (isRandomAccessRange!S && hasLength!S || !isRandomAccessRange!S)); dchar decodeBack(UseReplacementDchar useReplacementDchar = No.useReplacementDchar, S)(ref S str) if (isSomeString!S || isRandomAccessRange!S && hasLength!S && isSomeChar!(ElementType!S) || !isRandomAccessRange!S && isBidirectionalRange!S && isSomeChar!(ElementType!S));

decodeBackの変種である。 decodeの変種であり、特にの変種である。とは異なり decode, decodeBackとは異なりとは異なり、(文字列やランダムアクセスの範囲だけでなく)あらゆるコード単位の双方向範囲を受け入れる範囲ではなく)。また、ref 。デコードする。もし numCodeUnitsが渡されると、それはに設定される。

Parameters:

useReplacementDchar	無効なUTFの場合は、"スロー"ではなくreplacementDchar 。
S `str`	入力文字列または双方向Range
size_t `numCodeUnits`	処理されたコードユニットの数を与える

Returns:

デコードされたUTF文字。

Throws:

UTFExceptionもし str.backが有効なUTF シーケンスの末尾でなければならない。例外がスローされても str自体は変更されない、の値は保証されない。 numCodeUnits(の値は保証されない(渡された場合)。

Examples:

import std.range.primitives;
string str = "Hello, World!";

assert(str.decodeBack == '!' && str == "Hello, World");
str = "å";
assert(str.decodeBack == 'å' && str.empty);
str = "å";
size_t i;
assert(str.decodeBack(i) == 'å' && i == 2 && str.empty);

pure @safe size_t encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(out char[4] buf, dchar c); pure @safe size_t encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(out wchar[2] buf, dchar c); pure @safe size_t encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(out dchar[1] buf, dchar c);

エンコードする cを静的配列にエンコードする、 bufにエンコードし、実際の 1 4 エンコードされた文字の実際の長さを返す。 char[4] バッファの場合は1 と2 の間の数値)を返す。 wchar[2] バッファの場合はとの間の数値)。

Throws:

UTFException もし cは有効なUTFコードポイントではない。

Examples:

import std.exception : assertThrown;
import std.typecons : Yes;

char[4] buf;

assert(encode(buf, '\u0000') == 1 && buf[0 .. 1] == "\u0000");
assert(encode(buf, '\u007F') == 1 && buf[0 .. 1] == "\u007F");
assert(encode(buf, '\u0080') == 2 && buf[0 .. 2] == "\u0080");
assert(encode(buf, '\uE000') == 3 && buf[0 .. 3] == "\uE000");
assert(encode(buf, 0xFFFE) == 3 && buf[0 .. 3] == "\xEF\xBF\xBE");
assertThrown!UTFException(encode(buf, cast(dchar) 0x110000));

encode!(Yes.useReplacementDchar)(buf, cast(dchar) 0x110000);
auto slice = buf[];
writeln(slice.decodeFront); // replacementDchar

Examples:

import std.exception : assertThrown;
import std.typecons : Yes;

wchar[2] buf;

assert(encode(buf, '\u0000') == 1 && buf[0 .. 1] == "\u0000");
assert(encode(buf, '\uD7FF') == 1 && buf[0 .. 1] == "\uD7FF");
assert(encode(buf, '\uE000') == 1 && buf[0 .. 1] == "\uE000");
assert(encode(buf, '\U00010000') == 2 && buf[0 .. 2] == "\U00010000");
assert(encode(buf, '\U0010FFFF') == 2 && buf[0 .. 2] == "\U0010FFFF");
assertThrown!UTFException(encode(buf, cast(dchar) 0xD800));

encode!(Yes.useReplacementDchar)(buf, cast(dchar) 0x110000);
auto slice = buf[];
writeln(slice.decodeFront); // replacementDchar

Examples:

import std.exception : assertThrown;
import std.typecons : Yes;

dchar[1] buf;

assert(encode(buf, '\u0000') == 1 && buf[0] == '\u0000');
assert(encode(buf, '\uD7FF') == 1 && buf[0] == '\uD7FF');
assert(encode(buf, '\uE000') == 1 && buf[0] == '\uE000');
assert(encode(buf, '\U0010FFFF') == 1 && buf[0] == '\U0010FFFF');
assertThrown!UTFException(encode(buf, cast(dchar) 0xD800));

encode!(Yes.useReplacementDchar)(buf, cast(dchar) 0x110000);
writeln(buf[0]); // replacementDchar

pure @safe void encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(ref scope char[] str, dchar c); pure @safe void encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(ref scope wchar[] str, dchar c); pure @safe void encode(UseReplacementDchar useReplacementDchar = No.useReplacementDchar)(ref scope dchar[] str, dchar c);

エンコードする cにエンコードする。 strでエンコードし str.

Throws:

UTFException に追加する。 cに追加する。

Examples:

char[] s = "abcd".dup;
dchar d1 = 'a';
dchar d2 = 'ø';

encode(s, d1);
writeln(s.length); // 5
writeln(s); // "abcda"
encode(s, d2);
writeln(s.length); // 7
writeln(s); // "abcdaø"

pure nothrow @nogc @safe ubyte codeLength(C)(dchar c) if (isSomeChar!C);

コードポイントをエンコードするのに必要なコードユニットの数を返す cC をエンコードするのに必要なコードユニットの数を返す。

Examples:

writeln(codeLength!char('a')); // 1
writeln(codeLength!wchar('a')); // 1
writeln(codeLength!dchar('a')); // 1

writeln(codeLength!char('\U0010FFFF')); // 4
writeln(codeLength!wchar('\U0010FFFF')); // 2
writeln(codeLength!dchar('\U0010FFFF')); // 1

size_t codeLength(C, InputRange)(InputRange input) if (isSomeFiniteCharInputRange!InputRange);

をエンコードするのに必要なコードユニットの数を返す。str C をエンコードするのに必要なコードユニットの数を返す。これは特に便利であるこれは、ある文字列を別の文字列の長さでスライスするときに、2つの文字列型が異なる文字型を使用している場合に特に有用である。

Parameters:

C	のエンコード長を取得する文字型を指定する。
InputRange `input`	入力範囲からエンコード長を計算する

Returns:

のコードユニット数 inputにエンコードしたときのC

Examples:

assert(codeLength!char("hello world") ==
       "hello world".length);
assert(codeLength!wchar("hello world") ==
       "hello world"w.length);
assert(codeLength!dchar("hello world") ==
       "hello world"d.length);

assert(codeLength!char(`プログラミング`) ==
       `プログラミング`.length);
assert(codeLength!wchar(`プログラミング`) ==
       `プログラミング`w.length);
assert(codeLength!dchar(`プログラミング`) ==
       `プログラミング`d.length);

string haystack = `Être sans la verité, ça, ce ne serait pas bien.`;
wstring needle = `Être sans la verité`;
assert(haystack[codeLength!char(needle) .. $] ==
       `, ça, ce ne serait pas bien.`);

pure @safe void validate(S)(in S str) if (isSomeString!S);

をチェックする。 strが整形式ユニコードかどうかをチェックする。

Throws:

UTFException もし strが整形式でない。

Examples:

import std.exception : assertThrown;
char[] a = [167, 133, 175];
assertThrown!UTFException(validate(a));

string toUTF8(S)(S s) if (isSomeFiniteCharInputRange!S);

の要素をUTF-8にエンコードし sの要素をUTF-8にエンコードし、新しく確保した文字列を返す。

Parameters:

S s エンコードする文字列

Returns:

UTF-8の文字列

See Also:

これらの関数の遅延、非割り当てバージョンについては byUTF.

Examples:

import std.algorithm.comparison : equal;

// öは2つのUTF-8コードユニットで表される
assert("Hellø"w.toUTF8.equal(['H', 'e', 'l', 'l', 0xC3, 0xB8]));

// 𐐷はUTF-8では4つのコードユニットである
assert("𐐷"d.toUTF8.equal([0xF0, 0x90, 0x90, 0xB7]));

wstring toUTF16(S)(S s) if (isSomeFiniteCharInputRange!S);

の要素をUTF-16にエンコードする。 sの要素をUTF-16にエンコードし、新しくGCに割り当てられた wstring の要素をUTF-16にエンコードして返す。

Parameters:

S s エンコードする範囲

Returns:

UTF-16文字列

See Also:

これらの関数の遅延、非割り当てバージョンについては byUTF.

Examples:

import std.algorithm.comparison : equal;

// これらの書記素はUTF-16では2つのコード・ユニット、UTF-32では1つのコード・ユニットである
writeln("𤭢"d.length); // 1
writeln("𐐷"d.length); // 1

assert("𤭢"d.toUTF16.equal([0xD852, 0xDF62]));
assert("𐐷"d.toUTF16.equal([0xD801, 0xDC37]));

dstring toUTF32(S)(scope S s) if (isSomeFiniteCharInputRange!S);

の要素をUTF-32にエンコードする。 sの要素をUTF-32にエンコードし、新しくGCに割り当てられた dstring 要素を返す。

Parameters:

S s エンコードする範囲

Returns:

UTF-32文字列

See Also:

これらの関数の遅延、非割り当てバージョンについては byUTF.

Examples:

import std.algorithm.comparison : equal;

// これらの書記素はUTF-16では2つのコード・ユニット、UTF-32では1つのコード・ユニットである
writeln("𤭢"w.length); // 2
writeln("𐐷"w.length); // 2

assert("𤭢"w.toUTF32.equal([0x00024B62]));
assert("𐐷"w.toUTF32.equal([0x00010437]));

template toUTFz(P) if (is(P == C*, C) && isSomeChar!C)

str と等価なC言語のゼロ終端文字列を返す。str どのC関数も最初に見たものを文字列の終端として扱うので、'\0''を埋め込んではならない。 '\0' を文字列の終わりとして扱うからである。str.empty が true である場合、'\0' のみを含む文字列が返される。

toUTFzはどのような文字列の型でも受け入れられ、変換したい文字ポインタの型によってテンプレート化される。文字ポインタのタイプでテンプレート化される。可能であれば可能であれば新しい文字列の確保を避けるが、新しい文字列を確保しなければならなくなる可能性は十分にある。特に、以外の文字型を扱う場合はそうである。 char を扱う場合は特にそうだ。

警告1:もし toUTFzの結果がstr.ptr と等しい場合の結果がstr に等しい場合、の1つ前の文字(文字列を終了する文字)が変更されると、文字列は変更されない。 '\0' 文字列を終端する文字である)の1つ前の文字が変更されると、文字列はゼロ終端ではなくなる。になる。その可能性が最も高いのは、次のような場合だ。 str 、再割り当てが行われない場合、またはstr 。より大きな配列のスライスであり、より大きな配列の文字を変更した場合である。 str を変更した場合である。が変更可能な文字配列の直後にある場合である。 str の直後に変更可能な文字配列がある場合である。ユーザー定義型のメンバ変数で、一方がもう一方の直後に宣言されている場合など)。文字配列がたまたま'\0' で始まっていた場合である。このようなシナリオは決して起こらない。を呼び出した後、すぐにゼロ終端の文字列を使用すれば、そのようなシナリオは決して起こらない。 toUTFzを呼び出した直後にゼロ終端文字列を使用し、その文字列を使用するC関数がその文字列への参照を保持しない場合、このようなシナリオが発生することはない。また、ゼロ終端文字列を保存していたとしても、このようなシナリオが発生する可能性は低い。 (上記の例は、それが起こりうる数少ない例である)。しかし、ゼロ終端の文字列を保存し、その文字列がゼロ終端のままであることを絶対に保証したい場合は、次のようにする。文字列がゼロ終端のままであることを確実にしたい場合は、単に文字列に '\0' ptr を文字列に追加し、"@property"プロパティを使用する。 toUTFz.

警告2:文字ポインタをC関数に渡す場合、C関数は何らかの理由でそのポインタを保持し続ける。 C関数が何らかの理由でそのポインタを保持している場合、Dコードでそのポインタの参照を保持していることを確認すること。への参照を保持していることを確認すること。そうしないと、ガベージ・コレクション・サイクル中にそのポインタが消えてしまい、Cのコードでそのポインタを使用したときに厄介なバグが発生する。ガベージ・コレクション・サイクル中に消えてしまい、Cのコードがそれを使おうとしたときに厄介なバグを引き起こす可能性がある。

Examples:

auto p1 = toUTFz!(char*)("hello world");
auto p2 = toUTFz!(const(char)*)("hello world");
auto p3 = toUTFz!(immutable(char)*)("hello world");
auto p4 = toUTFz!(char*)("hello world"d);
auto p5 = toUTFz!(const(wchar)*)("hello world");
auto p6 = toUTFz!(immutable(dchar)*)("hello world"w);

pure @safe const(wchar)* toUTF16z(C)(const(C)[] str) if (isSomeChar!C);

toUTF16zはtoUTFz!(const(wchar)*) の便宜関数である。

文字列s をUTF-16にエンコードし、エンコードされた文字列を返す。 toUTF16zWin32APIの"W"関数を呼び出すのに適している。 LPCWSTR の'W'関数を呼び出すのに適している。

Examples:

string str = "Hello, World!";
const(wchar)* p = str.toUTF16z;
writeln(p[str.length]); // '\0'

pure nothrow @nogc @safe size_t count(C)(const(C)[] str) if (isSomeChar!C);

でエンコードされたコードポイントの総数を返す。 str.

スーパーメルセデスこの関数は、次の関数に優先する。 toUCSindex.

Standards:

ユニコード5.0、ASCII、ISO-8859-1、WINDOWS-1252

Throws:

UTFException もし strが整形式でない場合。

Examples:

writeln(count("")); // 0
writeln(count("a")); // 1
writeln(count("abc")); // 3
writeln(count("\u20AC100")); // 4

enum dchar replacementDchar;

無効な UTF シーケンスの代わりに挿入される。

参考文献 https://en.wikipedia.org/wiki/Replacement_character#Replacement_character

auto byCodeUnit(R)(R r) if (isConvertibleToString!R && !isStaticArray!R || isInputRange!R && isSomeChar!(ElementEncodingType!R));

char、wchar、dcharの範囲をコード単位で反復処理する。

その目的は、以下のような特殊ケースのデコードを回避することである。 std.range.primitives.frontをバイパスすることである。その結果で範囲を使用する byCodeUnitを使うと、nothrow 。 std.range.primitives.frontは無効なUnicode配列に遭遇するとスローする。シーケンスがスローされる。

コード・ユニットはUTFエンコーディングのビルディング・ブロックである。一般的に一般的に、個々のコードユニットは、完全な文字(Unicode用語では書記素クラスタ)として認識されるものを表さない。文字(Unicode用語では書記素クラスタ)を表すことはない。多くの文字は複数のコードユニットでエンコードされる。例えば、UTF-8のコードユニットは ø は0xC3 0xB8 である。つまり byCodeUnit の個々の要素は、それ自体では文字を形成しないことが多い。それを文字として扱おうとするとを文字として扱おうとすると、無意味な結果になる。

Parameters:

R r 入力範囲文字(文字列を含む)の入力範囲、または文字列型に暗黙的に変換する型。

Returns:

もし rが自動復号化可能な文字列でない場合(すなわち、狭い文字列または暗黙的に文字列型に変換されるユーザー定義の暗黙のうちに文字列型に変換されるユーザー定義型)でない場合は、次のようになる。 r が返される。

そうでなければ rは対応する文字列型に変換され(まだ文字列でない場合)、ランダムアクセス範囲にラップされる。に変換され、ランダムアクセス範囲にラップされる。文字列の要素エンコード型(そのコード単位)が範囲の要素型となる。にラップされ、その範囲が返される。範囲はスライシングを持つ。

もし rがそれ自体で文字の入力範囲となるような構造体やクラスである場合(つまり、入力範囲APIをメンバとして持っている場合)、その範囲はスライスされる。であり、文字列型に暗黙的に変換可能である。関数を持つ)、文字列型に暗黙のうちに変換可能である場合、次のようになる。 rが返され、暗黙の変換は行われない。

もし rが新しい範囲にラップされている場合、その範囲は現在その範囲に含まれる文字列を返すsource プロパティを持つ。プロパティを持つ。

See Also:

ユニコードに関するリファレンスは std.uniドキュメントを参照のこと。用語を参照すること。

書記素クラスタ(書き文字)ごとに反復する範囲については std.uni.byGrapheme.

Examples:

import std.range.primitives;
import std.traits : isAutodecodableString;

auto r = "Hello, World!".byCodeUnit();
static assert(hasLength!(typeof(r)));
static assert(hasSlicing!(typeof(r)));
static assert(isRandomAccessRange!(typeof(r)));
static assert(is(ElementType!(typeof(r)) == immutable char));

// 上記のリングと、標準文字列(オートデコードを有効にしている、していないにかかわらず)
// のレンジ機能を組み合わせる。
auto s = "Hello, World!";
static assert(isBidirectionalRange!(typeof(r)));
static if (isAutodecodableString!(typeof(s)))
{
    // 自動デコードが有効になっている場合、
    // 文字列はdcharの非ランダムアクセス範囲である。
    static assert(is(ElementType!(typeof(s)) == dchar));
    static assert(!isRandomAccessRange!(typeof(s)));
    static assert(!hasSlicing!(typeof(s)));
    static assert(!hasLength!(typeof(s)));
}
else
{
    // 自動デコードなしでは、文字列は通常の配列である。 
    static assert(is(ElementType!(typeof(s)) == immutable char));
    static assert(isRandomAccessRange!(typeof(s)));
    static assert(hasSlicing!(typeof(s)));
    static assert(hasLength!(typeof(s)));
}

Examples:

byCodeUnitユニコードのデコードは行わない

string noel1 = "noe\u0308l"; // noëlは、e + combining diaeresisを使用している
assert(noel1.byCodeUnit[2] != 'ë');
writeln(noel1.byCodeUnit[2]); // 'e'

string noel2 = "no\u00EBl"; // noëlはあらかじめ合成されたë文字を使っている
// 文字列はUTF-8なので、インデックス2のコードユニットは、
// 'ë'をエンコードするシーケンスの最初のコードユニットに過ぎない
assert(noel2.byCodeUnit[2] != 'ë');

Examples:

byCodeUnitは、幅の狭い文字列をラップする際に、source " プロパティを公開する。

import std.algorithm.comparison : equal;
import std.range : popFrontN;
import std.traits : isAutodecodableString;
{
    auto range = byCodeUnit("hello world");
    range.popFrontN(3);
    assert(equal(range.save, "lo world"));
    static if (isAutodecodableString!string) // 自動デコードでのみ有効
    {
        string str = range.source;
        writeln(str); // "lo world"
    }
}
// 範囲がラップされている場合のみソースが存在する
{
    auto range = byCodeUnit("hello world"d);
    static assert(!__traits(compiles, range.source));
}

alias byChar = byUTF!(char, Flag.yes).byUTF(R)(R r) if (isAutodecodableString!R && isInputRange!R && isSomeChar!(ElementEncodingType!R)); alias byWchar = byUTF!(wchar, Flag.yes).byUTF(R)(R r) if (isAutodecodableString!R && isInputRange!R && isSomeChar!(ElementEncodingType!R)); alias byDchar = byUTF!(dchar, Flag.yes).byUTF(R)(R r) if (isAutodecodableString!R && isInputRange!R && isSomeChar!(ElementEncodingType!R));

入力範囲を繰り返し処理する char、wchar、またはdcharによって文字の入力範囲を反復処理する。これらのエイリアスは、単に byUTFに転送するだけである。に転送するだけである。

Parameters:

R r	文字の入力範囲、または文字の配列

template byUTF(C, UseReplacementDchar useReplacementDchar = Yes.useReplacementDchar) if (isSomeChar!C)

入力範囲 C 、範囲の要素をエンコードする。

指定されたエンコーディングに変換できないUTFシーケンスは、「5.22 U+FFFDのベストプラクティス」に従ってU+FFFDに置き換えられる。 "5.22U+FFFD置換のベストプラクティス"に従ってU+FFFDで置換される。の"5.22 Best Practice for U+FFFD Substitution"に従ってU+FFFDに置き換えられるか、UTFExceptionがスローされる。したがって、byUTFは対称ではない。このアルゴリズムは遅延であり、メモリを割り当てない。 @nogc pure -ity、nothrow 、@safe-tyはパラメータから推測される。 r パラメータから推測される。

Parameters:

C	char,wchar, またはdchar
useReplacementDchar	UseReplacementDchar.yes は、無効な UTF をreplacementDchar で置き換えることを意味する、 UseReplacementDchar.no は無効なUTFに対してUTFException をスローする。

Throws:

UTFException 無効なUTFシーケンスで、が useReplacementDchar UseReplacementDchar.no

GC useReplacementDchar 、GCを使用しない。UseReplacementDchar.yes

Returns:

R が双方向範囲であり、自動復号可能でない場合、双方向範囲である、で定義される。 std.traits.isAutodecodableString.

R が順方向範囲であり、自動復号可能でない場合、順方向範囲となる。

あるいは、R が範囲であり、自動復号可能で、かつ is(ElementEncodingType!typeof(r) == C) に渡される。に渡される。 byCodeUnit.

それ以外の場合は、文字の入力範囲が渡される。

Examples:

import std.algorithm.comparison : equal;

// hellöはUTF-8である`char`の範囲である
assert("hell\u00F6".byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]));

// `wchar`は単一の要素(UTF-16コードユニット)にöを保持することができる
assert("hell\u00F6".byUTF!wchar().equal(['h', 'e', 'l', 'l', 'ö']));

// 𐐷はUTF-8では4つのコードユニット、UTF-16では2つ、UTF-32では1つである
assert("𐐷".byUTF!char().equal([0xF0, 0x90, 0x90, 0xB7]));
assert("𐐷".byUTF!wchar().equal([0xD801, 0xDC37]));
assert("𐐷".byUTF!dchar().equal([0x00010437]));

Examples:

import std.algorithm.comparison : equal;
import std.exception : assertThrown;

assert("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.yes).equal("hello\uFFFDetty"));
assertThrown!UTFException("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.no).equal("hello betty"));

Examples:

import std.range.primitives;
wchar[] s = ['ă', 'î'];

auto rc = s.byUTF!char;
static assert(isBidirectionalRange!(typeof(rc)));
writeln(rc.back); // 0xae
rc.popBack;
writeln(rc.back); // 0xc3
rc.popBack;
writeln(rc.back); // 0x83
rc.popBack;
writeln(rc.back); // 0xc4

auto rw = s.byUTF!wchar;
static assert(isBidirectionalRange!(typeof(rw)));
writeln(rw.back); // 'î'
rw.popBack;
writeln(rw.back); // 'ă'

auto rd = s.byUTF!dchar;
static assert(isBidirectionalRange!(typeof(rd)));
writeln(rd.back); // 'î'
rd.popBack;
writeln(rd.back); // 'ă'

DEEPL APIにより翻訳、ところどころ修正。
このページの最新版(英語)
このページの原文(英語)
翻訳時のdmdのバージョン: 2.108.0
ドキュメントのdmdのバージョン: 2.109.1
翻訳日付 :2024-04-13 01:16:06+09:00
HTML生成日時: 2025-01-09 08:10:33+09:00
編集者: dokutoku

言語リファレンス

std.utf