annotate lib/Support/ConvertUTF.c @ 77:54457678186b LLVM3.6

LLVM 3.6
author Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
date Mon, 08 Sep 2014 22:06:00 +0900
parents 95c75e76d11b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 /*===--- ConvertUTF.c - Universal Character Names conversions ---------------===
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 * The LLVM Compiler Infrastructure
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 * This file is distributed under the University of Illinois Open Source
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 * License. See LICENSE.TXT for details.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 *===------------------------------------------------------------------------=*/
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 * Copyright 2001-2004 Unicode, Inc.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12 * Disclaimer
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 * This source code is provided as is by Unicode, Inc. No claims are
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 * made as to fitness for any particular purpose. No warranties of any
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 * kind are expressed or implied. The recipient agrees to determine
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 * applicability of information provided. If this file has been
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 * purchased on magnetic or optical media from Unicode, Inc., the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 * sole remedy for any claim will be exchange of defective media
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 * within 90 days of receipt.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 * Limitations on Rights to Redistribute This Code
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 *
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24 * Unicode, Inc. hereby grants the right to freely use the information
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 * supplied in this file in the creation of products supporting the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26 * Unicode Standard, and to make copies of this file in any form
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 * for internal or external distribution as long as this notice
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28 * remains attached.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 /* ---------------------------------------------------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 Conversions between UTF32, UTF-16, and UTF-8. Source code file.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 Author: Mark E. Davis, 1994.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35 Rev History: Rick McGowan, fixes & updates May 2001.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 Sept 2001: fixed const & error conditions per
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 mods suggested by S. Parent & A. Lillich.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38 June 2002: Tim Dodd added detection and handling of incomplete
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 source sequences, enhanced error detection, added casts
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 to eliminate compiler warnings.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41 July 2003: slight mods to back out aggressive FFFE detection.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 Jan 2004: updated switches in from-UTF8 conversions.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43 Oct 2004: updated to use UNI_MAX_LEGAL_UTF32 in UTF-32 conversions.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 See the header file "ConvertUTF.h" for complete documentation.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 ------------------------------------------------------------------------ */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50 #include "llvm/Support/ConvertUTF.h"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 #ifdef CVTUTF_DEBUG
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52 #include <stdio.h>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53 #endif
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
54 #include <assert.h>
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 static const int halfShift = 10; /* used for shifting by 10 bits */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58 static const UTF32 halfBase = 0x0010000UL;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 static const UTF32 halfMask = 0x3FFUL;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 #define UNI_SUR_HIGH_START (UTF32)0xD800
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62 #define UNI_SUR_HIGH_END (UTF32)0xDBFF
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 #define UNI_SUR_LOW_START (UTF32)0xDC00
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 #define UNI_SUR_LOW_END (UTF32)0xDFFF
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65 #define false 0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66 #define true 1
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 * Index into the table below with the first byte of a UTF-8 sequence to
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 * get the number of trailing bytes that are supposed to follow it.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73 * Note that *legal* UTF-8 values can't have 4 or 5-bytes. The table is
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
74 * left as-is for anyone who may want to do such conversion, which was
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
75 * allowed in earlier algorithms.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
77 static const char trailingBytesForUTF8[256] = {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
78 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
80 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
81 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
82 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
83 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
84 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
85 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
86 };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
87
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
88 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
89 * Magic values subtracted from a buffer value during UTF8 conversion.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
90 * This table contains as many values as there might be trailing bytes
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
91 * in a UTF-8 sequence.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
92 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
93 static const UTF32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
94 0x03C82080UL, 0xFA082080UL, 0x82082080UL };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
95
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
96 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
97 * Once the bits are split out into bytes of UTF-8, this is a mask OR-ed
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
98 * into the first byte, depending on how many bytes follow. There are
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
99 * as many entries in this table as there are UTF-8 sequence types.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
100 * (I.e., one byte sequence, two byte... etc.). Remember that sequencs
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
101 * for *legal* UTF-8 will be 4 or fewer bytes total.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
102 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
103 static const UTF8 firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
104
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
105 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
106
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
107 /* The interface converts a whole buffer to avoid function-call overhead.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
108 * Constants have been gathered. Loops & conditionals have been removed as
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
109 * much as possible for efficiency, in favor of drop-through switches.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
110 * (See "Note A" at the bottom of the file for equivalent code.)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
111 * If your compiler supports it, the "isLegalUTF8" call can be turned
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
112 * into an inline function.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
113 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
114
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
115
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
116 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
117
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
118 ConversionResult ConvertUTF32toUTF16 (
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
119 const UTF32** sourceStart, const UTF32* sourceEnd,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
120 UTF16** targetStart, UTF16* targetEnd, ConversionFlags flags) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
121 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
122 const UTF32* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
123 UTF16* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
124 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
125 UTF32 ch;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
126 if (target >= targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
127 result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
128 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
129 ch = *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
130 if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
131 /* UTF-16 surrogate values are illegal in UTF-32; 0xffff or 0xfffe are both reserved values */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
132 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
133 if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
134 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
135 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
136 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
137 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
138 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
139 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
140 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
141 *target++ = (UTF16)ch; /* normal case */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
142 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
143 } else if (ch > UNI_MAX_LEGAL_UTF32) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
144 if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
145 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
146 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
147 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
148 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
149 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
150 /* target is a character in range 0xFFFF - 0x10FFFF. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
151 if (target + 1 >= targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
152 --source; /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
153 result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
154 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
155 ch -= halfBase;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
156 *target++ = (UTF16)((ch >> halfShift) + UNI_SUR_HIGH_START);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
157 *target++ = (UTF16)((ch & halfMask) + UNI_SUR_LOW_START);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
158 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
159 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
160 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
161 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
162 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
163 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
164
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
165 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
166
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
167 ConversionResult ConvertUTF16toUTF32 (
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
168 const UTF16** sourceStart, const UTF16* sourceEnd,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
169 UTF32** targetStart, UTF32* targetEnd, ConversionFlags flags) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
170 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
171 const UTF16* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
172 UTF32* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
173 UTF32 ch, ch2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
174 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
175 const UTF16* oldSource = source; /* In case we have to back up because of target overflow. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
176 ch = *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
177 /* If we have a surrogate pair, convert to UTF32 first. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
178 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
179 /* If the 16 bits following the high surrogate are in the source buffer... */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
180 if (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
181 ch2 = *source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
182 /* If it's a low surrogate, convert to UTF32. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
183 if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
184 ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
185 + (ch2 - UNI_SUR_LOW_START) + halfBase;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
186 ++source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
187 } else if (flags == strictConversion) { /* it's an unpaired high surrogate */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
188 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
189 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
190 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
191 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
192 } else { /* We don't have the 16 bits following the high surrogate. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
193 --source; /* return to the high surrogate */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
194 result = sourceExhausted;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
195 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
196 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
197 } else if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
198 /* UTF-16 surrogate values are illegal in UTF-32 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
199 if (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
200 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
201 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
202 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
203 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
204 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
205 if (target >= targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
206 source = oldSource; /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
207 result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
208 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
209 *target++ = ch;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
210 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
211 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
212 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
213 #ifdef CVTUTF_DEBUG
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
214 if (result == sourceIllegal) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
215 fprintf(stderr, "ConvertUTF16toUTF32 illegal seq 0x%04x,%04x\n", ch, ch2);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
216 fflush(stderr);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
217 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
218 #endif
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
219 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
220 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
221 ConversionResult ConvertUTF16toUTF8 (
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
222 const UTF16** sourceStart, const UTF16* sourceEnd,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
223 UTF8** targetStart, UTF8* targetEnd, ConversionFlags flags) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
224 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
225 const UTF16* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
226 UTF8* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
227 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
228 UTF32 ch;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
229 unsigned short bytesToWrite = 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
230 const UTF32 byteMask = 0xBF;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
231 const UTF32 byteMark = 0x80;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
232 const UTF16* oldSource = source; /* In case we have to back up because of target overflow. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
233 ch = *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
234 /* If we have a surrogate pair, convert to UTF32 first. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
235 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_HIGH_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
236 /* If the 16 bits following the high surrogate are in the source buffer... */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
237 if (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
238 UTF32 ch2 = *source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
239 /* If it's a low surrogate, convert to UTF32. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
240 if (ch2 >= UNI_SUR_LOW_START && ch2 <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
241 ch = ((ch - UNI_SUR_HIGH_START) << halfShift)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
242 + (ch2 - UNI_SUR_LOW_START) + halfBase;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
243 ++source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
244 } else if (flags == strictConversion) { /* it's an unpaired high surrogate */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
245 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
246 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
247 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
248 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
249 } else { /* We don't have the 16 bits following the high surrogate. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
250 --source; /* return to the high surrogate */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
251 result = sourceExhausted;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
252 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
253 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
254 } else if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
255 /* UTF-16 surrogate values are illegal in UTF-32 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
256 if (ch >= UNI_SUR_LOW_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
257 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
258 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
259 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
260 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
261 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
262 /* Figure out how many bytes the result will require */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
263 if (ch < (UTF32)0x80) { bytesToWrite = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
264 } else if (ch < (UTF32)0x800) { bytesToWrite = 2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
265 } else if (ch < (UTF32)0x10000) { bytesToWrite = 3;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
266 } else if (ch < (UTF32)0x110000) { bytesToWrite = 4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
267 } else { bytesToWrite = 3;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
268 ch = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
269 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
270
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
271 target += bytesToWrite;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
272 if (target > targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
273 source = oldSource; /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
274 target -= bytesToWrite; result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
275 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
276 switch (bytesToWrite) { /* note: everything falls through. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
277 case 4: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
278 case 3: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
279 case 2: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
280 case 1: *--target = (UTF8)(ch | firstByteMark[bytesToWrite]);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
281 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
282 target += bytesToWrite;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
283 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
284 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
285 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
286 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
287 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
288
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
289 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
290
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
291 ConversionResult ConvertUTF32toUTF8 (
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
292 const UTF32** sourceStart, const UTF32* sourceEnd,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
293 UTF8** targetStart, UTF8* targetEnd, ConversionFlags flags) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
294 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
295 const UTF32* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
296 UTF8* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
297 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
298 UTF32 ch;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
299 unsigned short bytesToWrite = 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
300 const UTF32 byteMask = 0xBF;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
301 const UTF32 byteMark = 0x80;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
302 ch = *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
303 if (flags == strictConversion ) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
304 /* UTF-16 surrogate values are illegal in UTF-32 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
305 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
306 --source; /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
307 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
308 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
309 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
310 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
311 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
312 * Figure out how many bytes the result will require. Turn any
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
313 * illegally large UTF32 things (> Plane 17) into replacement chars.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
314 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
315 if (ch < (UTF32)0x80) { bytesToWrite = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
316 } else if (ch < (UTF32)0x800) { bytesToWrite = 2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
317 } else if (ch < (UTF32)0x10000) { bytesToWrite = 3;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
318 } else if (ch <= UNI_MAX_LEGAL_UTF32) { bytesToWrite = 4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
319 } else { bytesToWrite = 3;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
320 ch = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
321 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
322 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
323
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
324 target += bytesToWrite;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
325 if (target > targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
326 --source; /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
327 target -= bytesToWrite; result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
328 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
329 switch (bytesToWrite) { /* note: everything falls through. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
330 case 4: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
331 case 3: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
332 case 2: *--target = (UTF8)((ch | byteMark) & byteMask); ch >>= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
333 case 1: *--target = (UTF8) (ch | firstByteMark[bytesToWrite]);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
334 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
335 target += bytesToWrite;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
336 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
337 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
338 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
339 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
340 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
341
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
342 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
343
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
344 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
345 * Utility routine to tell whether a sequence of bytes is legal UTF-8.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
346 * This must be called with the length pre-determined by the first byte.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
347 * If not calling this from ConvertUTF8to*, then the length can be set by:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
348 * length = trailingBytesForUTF8[*source]+1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
349 * and the sequence is illegal right away if there aren't that many bytes
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
350 * available.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
351 * If presented with a length > 4, this returns false. The Unicode
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
352 * definition of UTF-8 goes up to 4-byte sequences.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
353 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
354
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
355 static Boolean isLegalUTF8(const UTF8 *source, int length) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
356 UTF8 a;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
357 const UTF8 *srcptr = source+length;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
358 switch (length) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
359 default: return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
360 /* Everything else falls through when "true"... */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
361 case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
362 case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
363 case 2: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
364
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
365 switch (*source) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
366 /* no fall-through in this inner switch */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
367 case 0xE0: if (a < 0xA0) return false; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
368 case 0xED: if (a > 0x9F) return false; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
369 case 0xF0: if (a < 0x90) return false; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
370 case 0xF4: if (a > 0x8F) return false; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
371 default: if (a < 0x80) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
372 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
373
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
374 case 1: if (*source >= 0x80 && *source < 0xC2) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
375 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
376 if (*source > 0xF4) return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
377 return true;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
378 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
379
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
380 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
381
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
382 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
383 * Exported function to return whether a UTF-8 sequence is legal or not.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
384 * This is not used here; it's just exported.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
385 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
386 Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
387 int length = trailingBytesForUTF8[*source]+1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
388 if (length > sourceEnd - source) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
389 return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
390 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
391 return isLegalUTF8(source, length);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
392 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
393
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
394 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
395
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
396 static unsigned
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
397 findMaximalSubpartOfIllFormedUTF8Sequence(const UTF8 *source,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
398 const UTF8 *sourceEnd) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
399 UTF8 b1, b2, b3;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
400
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
401 assert(!isLegalUTF8Sequence(source, sourceEnd));
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
402
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
403 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
404 * Unicode 6.3.0, D93b:
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
405 *
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
406 * Maximal subpart of an ill-formed subsequence: The longest code unit
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
407 * subsequence starting at an unconvertible offset that is either:
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
408 * a. the initial subsequence of a well-formed code unit sequence, or
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
409 * b. a subsequence of length one.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
410 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
411
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
412 if (source == sourceEnd)
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
413 return 0;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
414
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
415 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
416 * Perform case analysis. See Unicode 6.3.0, Table 3-7. Well-Formed UTF-8
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
417 * Byte Sequences.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
418 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
419
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
420 b1 = *source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
421 ++source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
422 if (b1 >= 0xC2 && b1 <= 0xDF) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
423 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
424 * First byte is valid, but we know that this code unit sequence is
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
425 * invalid, so the maximal subpart has to end after the first byte.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
426 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
427 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
428 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
429
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
430 if (source == sourceEnd)
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
431 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
432
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
433 b2 = *source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
434 ++source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
435
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
436 if (b1 == 0xE0) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
437 return (b2 >= 0xA0 && b2 <= 0xBF) ? 2 : 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
438 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
439 if (b1 >= 0xE1 && b1 <= 0xEC) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
440 return (b2 >= 0x80 && b2 <= 0xBF) ? 2 : 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
441 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
442 if (b1 == 0xED) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
443 return (b2 >= 0x80 && b2 <= 0x9F) ? 2 : 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
444 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
445 if (b1 >= 0xEE && b1 <= 0xEF) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
446 return (b2 >= 0x80 && b2 <= 0xBF) ? 2 : 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
447 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
448 if (b1 == 0xF0) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
449 if (b2 >= 0x90 && b2 <= 0xBF) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
450 if (source == sourceEnd)
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
451 return 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
452
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
453 b3 = *source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
454 return (b3 >= 0x80 && b3 <= 0xBF) ? 3 : 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
455 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
456 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
457 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
458 if (b1 >= 0xF1 && b1 <= 0xF3) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
459 if (b2 >= 0x80 && b2 <= 0xBF) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
460 if (source == sourceEnd)
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
461 return 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
462
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
463 b3 = *source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
464 return (b3 >= 0x80 && b3 <= 0xBF) ? 3 : 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
465 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
466 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
467 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
468 if (b1 == 0xF4) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
469 if (b2 >= 0x80 && b2 <= 0x8F) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
470 if (source == sourceEnd)
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
471 return 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
472
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
473 b3 = *source;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
474 return (b3 >= 0x80 && b3 <= 0xBF) ? 3 : 2;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
475 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
476 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
477 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
478
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
479 assert((b1 >= 0x80 && b1 <= 0xC1) || b1 >= 0xF5);
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
480 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
481 * There are no valid sequences that start with these bytes. Maximal subpart
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
482 * is defined to have length 1 in these cases.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
483 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
484 return 1;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
485 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
486
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
487 /* --------------------------------------------------------------------- */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
488
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
489 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
490 * Exported function to return the total number of bytes in a codepoint
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
491 * represented in UTF-8, given the value of the first byte.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
492 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
493 unsigned getNumBytesForUTF8(UTF8 first) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
494 return trailingBytesForUTF8[first] + 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
495 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
496
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
497 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
498
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
499 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
500 * Exported function to return whether a UTF-8 string is legal or not.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
501 * This is not used here; it's just exported.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
502 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
503 Boolean isLegalUTF8String(const UTF8 **source, const UTF8 *sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
504 while (*source != sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
505 int length = trailingBytesForUTF8[**source] + 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
506 if (length > sourceEnd - *source || !isLegalUTF8(*source, length))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
507 return false;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
508 *source += length;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
509 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
510 return true;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
511 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
512
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
513 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
514
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
515 ConversionResult ConvertUTF8toUTF16 (
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
516 const UTF8** sourceStart, const UTF8* sourceEnd,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
517 UTF16** targetStart, UTF16* targetEnd, ConversionFlags flags) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
518 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
519 const UTF8* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
520 UTF16* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
521 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
522 UTF32 ch = 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
523 unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
524 if (extraBytesToRead >= sourceEnd - source) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
525 result = sourceExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
526 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
527 /* Do this check whether lenient or strict */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
528 if (!isLegalUTF8(source, extraBytesToRead+1)) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
529 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
530 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
531 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
532 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
533 * The cases all fall through. See "Note A" below.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
534 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
535 switch (extraBytesToRead) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
536 case 5: ch += *source++; ch <<= 6; /* remember, illegal UTF-8 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
537 case 4: ch += *source++; ch <<= 6; /* remember, illegal UTF-8 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
538 case 3: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
539 case 2: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
540 case 1: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
541 case 0: ch += *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
542 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
543 ch -= offsetsFromUTF8[extraBytesToRead];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
544
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
545 if (target >= targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
546 source -= (extraBytesToRead+1); /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
547 result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
548 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
549 if (ch <= UNI_MAX_BMP) { /* Target is a character <= 0xFFFF */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
550 /* UTF-16 surrogate values are illegal in UTF-32 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
551 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
552 if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
553 source -= (extraBytesToRead+1); /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
554 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
555 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
556 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
557 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
558 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
559 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
560 *target++ = (UTF16)ch; /* normal case */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
561 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
562 } else if (ch > UNI_MAX_UTF16) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
563 if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
564 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
565 source -= (extraBytesToRead+1); /* return to the start */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
566 break; /* Bail out; shouldn't continue */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
567 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
568 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
569 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
570 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
571 /* target is a character in range 0xFFFF - 0x10FFFF. */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
572 if (target + 1 >= targetEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
573 source -= (extraBytesToRead+1); /* Back up source pointer! */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
574 result = targetExhausted; break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
575 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
576 ch -= halfBase;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
577 *target++ = (UTF16)((ch >> halfShift) + UNI_SUR_HIGH_START);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
578 *target++ = (UTF16)((ch & halfMask) + UNI_SUR_LOW_START);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
579 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
580 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
581 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
582 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
583 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
584 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
585
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
586 /* --------------------------------------------------------------------- */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
587
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
588 static ConversionResult ConvertUTF8toUTF32Impl(
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
589 const UTF8** sourceStart, const UTF8* sourceEnd,
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
590 UTF32** targetStart, UTF32* targetEnd, ConversionFlags flags,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
591 Boolean InputIsPartial) {
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
592 ConversionResult result = conversionOK;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
593 const UTF8* source = *sourceStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
594 UTF32* target = *targetStart;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
595 while (source < sourceEnd) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
596 UTF32 ch = 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
597 unsigned short extraBytesToRead = trailingBytesForUTF8[*source];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
598 if (extraBytesToRead >= sourceEnd - source) {
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
599 if (flags == strictConversion || InputIsPartial) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
600 result = sourceExhausted;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
601 break;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
602 } else {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
603 result = sourceIllegal;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
604
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
605 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
606 * Replace the maximal subpart of ill-formed sequence with
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
607 * replacement character.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
608 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
609 source += findMaximalSubpartOfIllFormedUTF8Sequence(source,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
610 sourceEnd);
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
611 *target++ = UNI_REPLACEMENT_CHAR;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
612 continue;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
613 }
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
614 }
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
615 if (target >= targetEnd) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
616 result = targetExhausted; break;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
617 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
618
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
619 /* Do this check whether lenient or strict */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
620 if (!isLegalUTF8(source, extraBytesToRead+1)) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
621 result = sourceIllegal;
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
622 if (flags == strictConversion) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
623 /* Abort conversion. */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
624 break;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
625 } else {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
626 /*
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
627 * Replace the maximal subpart of ill-formed sequence with
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
628 * replacement character.
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
629 */
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
630 source += findMaximalSubpartOfIllFormedUTF8Sequence(source,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
631 sourceEnd);
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
632 *target++ = UNI_REPLACEMENT_CHAR;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
633 continue;
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
634 }
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
635 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
636 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
637 * The cases all fall through. See "Note A" below.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
638 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
639 switch (extraBytesToRead) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
640 case 5: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
641 case 4: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
642 case 3: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
643 case 2: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
644 case 1: ch += *source++; ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
645 case 0: ch += *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
646 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
647 ch -= offsetsFromUTF8[extraBytesToRead];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
648
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
649 if (ch <= UNI_MAX_LEGAL_UTF32) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
650 /*
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
651 * UTF-16 surrogate values are illegal in UTF-32, and anything
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
652 * over Plane 17 (> 0x10FFFF) is illegal.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
653 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
654 if (ch >= UNI_SUR_HIGH_START && ch <= UNI_SUR_LOW_END) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
655 if (flags == strictConversion) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
656 source -= (extraBytesToRead+1); /* return to the illegal value itself */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
657 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
658 break;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
659 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
660 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
661 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
662 } else {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
663 *target++ = ch;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
664 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
665 } else { /* i.e., ch > UNI_MAX_LEGAL_UTF32 */
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
666 result = sourceIllegal;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
667 *target++ = UNI_REPLACEMENT_CHAR;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
668 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
669 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
670 *sourceStart = source;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
671 *targetStart = target;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
672 return result;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
673 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
674
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
675 ConversionResult ConvertUTF8toUTF32Partial(const UTF8 **sourceStart,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
676 const UTF8 *sourceEnd,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
677 UTF32 **targetStart,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
678 UTF32 *targetEnd,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
679 ConversionFlags flags) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
680 return ConvertUTF8toUTF32Impl(sourceStart, sourceEnd, targetStart, targetEnd,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
681 flags, /*InputIsPartial=*/true);
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
682 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
683
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
684 ConversionResult ConvertUTF8toUTF32(const UTF8 **sourceStart,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
685 const UTF8 *sourceEnd, UTF32 **targetStart,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
686 UTF32 *targetEnd, ConversionFlags flags) {
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
687 return ConvertUTF8toUTF32Impl(sourceStart, sourceEnd, targetStart, targetEnd,
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
688 flags, /*InputIsPartial=*/false);
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
689 }
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
690
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
691 /* ---------------------------------------------------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
692
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
693 Note A.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
694 The fall-through switches in UTF-8 reading code save a
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
695 temp variable, some decrements & conditionals. The switches
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
696 are equivalent to the following loop:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
697 {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
698 int tmpBytesToRead = extraBytesToRead+1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
699 do {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
700 ch += *source++;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
701 --tmpBytesToRead;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
702 if (tmpBytesToRead) ch <<= 6;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
703 } while (tmpBytesToRead > 0);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
704 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
705 In UTF-8 writing code, the switches on "bytesToWrite" are
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
706 similarly unrolled loops.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
707
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
708 --------------------------------------------------------------------- */