Thứ Sáu, 28 tháng 2, 2014

Tài liệu The New C Standard- P5 docx

5.2.4.2.2 Characteristics of floating types <float.h>
372
Coding Guidelines
The usage of these macros in existing code is so rare that reliable information on incorrect usage is not
available, making it impossible to provide any guideline recommendations. (The rare usage could also imply
that a guideline recommendation would not be worthwhile).
371
— minimum negative integer such that 10 raised to that power is in the range of normalized floating-point
*_MIN_10_EXP
numbers, log
10
b
e
min
−1

FLT_MIN_10_EXP -37
DBL_MIN_10_EXP -37
LDBL_MIN_10_EXP -37
Commentary
Making this information available as an integer constant allows it to be accessed in a
#if
preprocessing
directive.
These are the exponent values for normalized numbers. If subnormal numbers are supported, the smallest
338 subnormal
numbers
representable value is likely to have an exponent whose value is
FLT_DIG
,
DBL_DIG
, and
LDBL_DIG
less than
(toward negative infinity) these values, respectively.
The Committee is being very conservative in specifying the minimum values for the exponents of the
types
double
and
long double
. An implementation is permitted to define the same range of exponents for
all floating-point types. There may be normalized numbers whose respective exponent value is smaller than
the values given for these macros; for instance, the exponents appearing in the
*
_MIN
macros. The power of
10 exponent values given for these
*
_MIN_10_EXP macros can be applied to any normalized significand.
C
++
18.2.1.2p25
static const int min_exponent10;
Minimum negative integer such that 10 raised to that power is in the range of normalised floating point
numbers.
190)
Footnote 190
Equivalent to FLT_MIN_10_EXP, DBL_MIN_10_EXP, LDBL_MIN_10_EXP.
18.2.2p4
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Common Implementations
The value of DBL_MIN_10_EXP is usually the same as FLT_MIN_10_EXP or LDBL_MIN_10_EXP. In the latter
case a value of -307 is often seen.
372
— maximum integer such that
FLT_RADIX
raised to one less than that power is a representable finite floating-
*_MAX_EXP
point number, e
max
FLT_MAX_EXP
DBL_MAX_EXP
LDBL_MAX_EXP
Commentary
FLT_RADIX
to the power *_MAX_EXP is the smallest large number that cannot be represented (because of
370 *_MIN_EXP
limited exponent range).
June 24, 2009 v 1.2
5.2.4.2.2 Characteristics of floating types <float.h>
374
C
++
18.2.1.2p27
static const int max_exponent;
Maximum positive integer such that
radix
raised to the power one less than that integer is a representable finite
floating point number.
191)
Footnote 191
Equivalent to FLT_MAX_EXP, DBL_MAX_EXP, LDBL_MAX_EXP.
18.2.2p4
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Other Languages
Fortran 90 contains the inquiry function MAXEXPONENT which performs a similar function.
Common Implementations
In IEC 60559 the value for single-precision is 128 and for double-precision 1024.
373
— maximum integer such that 10 raised to that power is in the range of representable finite floating-point
*_MAX_10_EXP
numbers, log
10
((1 − b
−p
)b
e
max
)
FLT_MAX_10_EXP +37
DBL_MAX_10_EXP +37
LDBL_MAX_10_EXP +37
Commentary
As in choosing the
*
_MIN_10_EXP values, the Committee is being conservative.
*_MIN_10_EXP
371
C
++
18.2.1.2p29
static const int max_exponent10;
Maximum positive integer such that 10 raised to that power is in the range of normalised floating point numbers.
Footnote 192
Equivalent to FLT_MAX_10_EXP, DBL_MAX_10_EXP, LDBL_MAX_10_EXP.
18.2.2
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Common Implementations
The value of DBL_MAX_10_EXP is usually the same as FLT_MAX_10_EXP or LDBL_MAX_10_EXP. In the latter
case a value of 307 is often seen.
374
The values given in the following list shall be replaced by constant expressions with implementation-defined
floating values
listed
values that are greater than or equal to those shown:
v 1.2 June 24, 2009
5.2.4.2.2 Characteristics of floating types <float.h>
375
Commentary
This is a requirement on the implementation. The requirement that they be constant expressions ensures that
they can be used to initialize an object having static storage duration.
The values listed represent a floating-point number. Their equivalents in the integer domain are required
822 symbolic
name
303 integer types
sizes
to have appropriate promoted types. There is no such requirement specified for these floating-point values.
C90
C90 did not contain the requirement that the values be constant expressions.
C
++
This requirement is not specified in the C
++
Standard, which refers to the C90 Standard by reference.
375
— maximum representable finite floating-point number, (1 − b
−p
)b
e
max
FLT_MAX 1E+37
DBL_MAX 1E+37
LDBL_MAX 1E+37
Commentary
There is no requirement that the type of the value of these macros match the real type whose maximum they
denote. Although some implementations include a representation for infinity, the definition of these macros
require the value to be finite. These values correspond to a FLT_RADIX value of 10 and the exponent values
given by the
*
_MAX_10_EXP macros.
373
*_MAX_10_EXP
The HUGE_VAL macro value may compare larger than any of these values.
C
++
18.2.1.2p4
static T max() throw();
Maximum finite value.
182
Footnote 182
Equivalent to CHAR_MAX, SHRT_MAX, FLT_MAX, DBL_MAX, etc.
18.2.2p4
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Other Languages
The class java.lang.Float contains the member:
1 public static final float MAX_VALUE = 3.4028235e+38f
The class java.lang.Double contains the member:
1 public static final double MAX_VALUE = 1.7976931348623157e+308
Fortran 90 contains the inquiry function HUGE which performs a similar function.
Common Implementations
Many implementations use a suffix to give the value a type corresponding to what the macro represents. The
IEC 60559 values of these macros are:
single float FLT_MAX 3.40282347e+38
double float DBL_MAX 1.7976931348623157e+308
380 EXAMPLE
minimum floating-
point representa-
tion
381 EXAMPLE
IEC 60559 floating-
point
June 24, 2009 v 1.2
5.2.4.2.2 Characteristics of floating types <float.h>
377
Coding Guidelines
How many calculations ever produce a value that is anywhere near
FLT_MAX
? The known Universe is thought
to be 3×10
29
mm in diameter, 5×10
19
milliseconds old, and contain 10
79
atoms, while the Earth is known to
have a mass of 6×10
24
Kg.
Floating-point values whose magnitude approaches
DBL_MAX
, or even
FLT_MAX
are only likely to occur as
the intermediate results of calculating a final value. Very small numbers are easily created from values that
do not quite cancel. Dividing by a very small value can lead to a very large value. Very large values are thus
more often a symptom of a problem, rounding errors or poor handling of values that almost cancel, than of
an application meaningful value.
On overflow some processors saturate to the maximum representable value, while others return infinity.
Testing whether an operation will overflow is one use for these macros, e.g., does adding
y
to
x
overflow
x >
LDBL_MAX - y. In C99 the isinf macro might be used, e.g., isinf(x + y).
Example
1 #include <float.h>
2
3 #define FALSE 0
4 #define TRUE 1
5
6 extern float f_glob;
7
8 _Bool f(float p1, float p2)
9 {
10 if (f_glob > (FLT_MAX / p1))
11 return FALSE;
12
13 f_glob
*
= p1;
14
15 if (f_glob > (FLT_MAX - p2))
16 return FALSE;
17
18 f_glob += p2;
19
20 return TRUE;
21 }
376
The values given in the following list shall be replaced by constant expressions with implementation-defined
(positive) values that are less than or equal to those shown:
Commentary
The previous discussion is applicable here.
floating val-
ues listed
374
377
— the difference between 1 and the least value greater than 1 that is representable in the given floating point
*_EPSILON
type, b
1−p
FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
LDBL_EPSILON 1E-9
Commentary
The Committee is being very conservative in specifying these values. Although IEC 60559 arithmetic is in
IEC 60559 29
common use, there are several major floating-point implementations of it that do not support an extended
v 1.2 June 24, 2009
5.2.4.2.2 Characteristics of floating types <float.h>
377
precision. The Committee could not confidently expect implementations to support the type
long double
containing greater accuracy than the type double.
Like the
*
_DIG macros more significand digits are required for the types double and long double.
369 *_DIG
macros
Methods for obtaining the nearest predecessor and successor of any IEEE floating-point value are given
by Rump, Zimmermann, Boldo, Melquiond.
[1210]
C
++
18.2.1.2p20
static T epsilon() throw();
Machine epsilon: the difference between 1 and the least value greater than 1 that is representable.
187)
Footnote 187
Equivalent to FLT_EPSILON, DBL_EPSILON, LDBL_EPSILON.
18.2.2p4
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Other Languages
Fortran 90 contains the inquiry function EPSILON, which performs a similar function.
Common Implementations
Some implementations (e.g., Apple) use a contiguous pair of objects having type
double
to represent an
368 long double
Apple
object having type
long double
. Such a representation creates a second meaning for
LDBL_EPSILON
. This
is because, in such a representation, the least value greater than
1.0
is
1.0+LDBL_MIN
, a difference of
LDBL_MIN
(which is not the same as
b
(1−p)
)— the correct definition of
*
_EPSILON
. Their IEC 60559 values
are:
FLT_EPSILON 1.19209290e-7 /
*
0x1p-23
*
/
DBL_EPSILON 2.2204460492503131e-16 /
*
0x1p-52
*
/
Coding Guidelines
It is a common mistake for these values to be naively used in equality comparisons:
1 #define EQUAL_DBL(x, y) ((((x)-DBL_EPSILON) < (y)) && \
2 (((x)+DBL_EPSILON) > (y)))
This test will only work as expected when
x
is close to 1.0. The difference value not only needs to scale with
x
,
(x + x
*
DBL_EPSILON)
, but the value
DBL_EPSILON
is probably too small (equality within 1 ULP is a
346 ULP
very tight bound):
1 #define EQUAL_DBL(x, y) ((((x)
*
(1.0-MY_EPSILON)) < (y)) && \
2 (((x)
*
(1.0+MY_EPSILON)) > (y)))
Even this test fails to work as expected if
x
and
y
are subnormal values. For instance, if
x
is the smallest
subnormal and y is just 1 ULP bigger, y is twice x.
Another, less computationally intensive, method is to subtract the values and check whether the result is
within some scaled approximation of zero.
1 #include <math.h>
2
3 _Bool equalish(double f_1, double f_2)
June 24, 2009 v 1.2
5.2.4.2.2 Characteristics of floating types <float.h>
378
4 {
5 int exponent;
6 frexp(((fabs(f_1) > fabs(f_2)) ? f_1 : f_2), &exponent);
7 return (fabs(f_1-f_2) < ldexp(MY_EPSILON, exponent));
8 }
378
— minimum normalized positive floating-point number, b
e
min
−1
*_MIN
macros
FLT_MIN 1E-37
DBL_MIN 1E-37
LDBL_MIN 1E-37
Commentary
These values correspond to a
FLT_RADIX
value of 10 and the exponent values given by the
*
_MIN_10_EXP
macros. There is no requirement that the type of these macros match the real type whose minimum they
*_MIN_10_EXP
371
denote. Implementations that support subnormal numbers will be able to represent smaller quantities than
subnormal
numbers
338
these.
C
++
18.2.1.2p1
static T min() throw();
Maximum finite value.
181)
Footnote 181
Equivalent to CHAR_MIN, SHRT_MIN, FLT_MIN, DBL_MIN, etc.
18.2.2p4
Header <cfloat> (Table 17): . . . The contents are the same as the Standard C library header <float.h>.
Other Languages
The class java.lang.Float contains the member:
1 public static final float MIN_VALUE = 1.4e-45f;
The class java.lang.Double contains the member:
1 public static final double MIN_VALUE = 5e-324;
which are the smallest subnormal, rather than normal, values.
Fortran 90 contains the inquiry function TINY which performs a similar function.
Common Implementations
Their IEC 60559 values are:
FLT_MIN 1.17549435e-38f
DBL_MIN 2.2250738585072014e-308
Implementations without hardware support for floating point sometimes chose the minimum required limits
because of the execution-time overhead in supporting additional bits in the floating-point representation.
v 1.2 June 24, 2009
5.2.4.2.2 Characteristics of floating types <float.h>
379
Coding Guidelines
How many calculations ever produce a value that is anywhere near as small as
FLT_MIN
? The hydrogen atom
weighs 10
-26
Kg and has an approximate radius of 5
×
10
-11
meters, well within limits. But current theories on
the origin of the Universe start at approximately 10
-36
seconds, a very small number. However, writers of
third-party libraries might not know whether their users are simulating the Big Bang, or weighing groceries.
They need to ensure that all cases are handled.
Given everyday physical measurements, which don’t have very small values, where can very small
numbers originate? Subtracting two floating-point quantities that differ by 1 ULP, for instance, produces a
value that is approximately 10
-5
smaller. Such a difference can result from random fluctuations in the values
input to a program, or because of rounding errors in calculations. Producing a value that is close to
FLT_MIN
invariably requires either a very complex calculation, or an iterative algorithm using values from previous
iterations. Intermediate results that are expected to produce a value of zero may in fact deliver a very small
value. Subsequent tests against zero fail and the very small value is passed through into further calculations.
One solution to this problem is to have a relatively wide test of zeroness. In many physical systems a value
that is a factor of 10
-6
smaller than the smallest measurable quantity would be considered to be zero.
Rev
378.1
Floating-point comparisons against zero shall take into account the physical properties or engineering
tolerances of the system being controlled or simulated.
There might be some uncertainty in the interpretation of the test
(abs(x) < FLT_MIN)
; is it an approximate
test against zero, or a test for a subnormal value? The C Standard now includes the
fpclassify
macro for
obtaining the classification of its argument, including subnormal.
Example
1 #include <math.h>
2
3 #define MIN_TOLERANCE (1e-9)
4
5 _Bool inline effectively_zero(float valu)
6 {
7 return (abs(valu) < MIN_TOLERANCE);
8 }
Recommended practice
379
Conversion from (at least)
double
to decimal with
DECIMAL_DIG
digits and back should be the identity function.
DECIMAL_DIG
conversion
recommended
practice
Commentary
Why is this a recommended practice? Unfortunately many existing implementations of
printf
and
scanf
do a poor job of base conversions, and they are not the identity functions.
To claim conformance to both C99 and IEC 60559 (Annex F in force), the requirements of F.5 Binary-
decimal conversion must be met. Just making use of IEC 60559 floating-point hardware is not sufficient.
The I/O library can still be implemented incorrectly and the conversions be wrong.
Rationale
When the radix
b
is not a power of 10, it can be difficult to find a case where a decimal number with
p× log
10
b
digits fails. Consider a four-bit mantissa system (that is, base
b = 2
and precision
p = 4
) used to represent
one-digit decimal numbers. While four bits are enough to represent one-digit numbers, they are not enough to
support the conversions of decimal to binary and back to decimal in all cases (but they are enough for most
cases). Consider a power of 2 that is just under 9.5e21, for example, 2
73
= 9.44e21. For this number, the
three consecutive one-digit numbers near that special value and their round-to-nearest representations are:
June 24, 2009 v 1.2
5.2.4.2.2 Characteristics of floating types <float.h>
380
9e21 1e22 2e22
0xFp69 0x8p70 0x8p71
No problems so far; but when these representations are converted back to decimal, the values as three-digit
numbers and the rounded one-digit numbers are:
8.85e21 9.44e21 1.89e22
9e21 9e21 2e22
and we end up with two values the same. For this reason, four-bit mantissas are not enough to start with any
one-digit decimal number, convert it to a binary floating-point representation, and then convert back to the
same one-digit decimal number in all cases; and so
p
radix
b
digits are (just barely) not enough to allow any
decimal numbers with
p× log
10
b
digits to do the round-trip conversion.
p
radix
b
digits are enough, however,
for (p − 1)× log
10
b digits in all cases.
The issues involved in performing correctly rounded decimal-to-binary and binary-to-decimal conversions
are discussed mathematically by Gay.
[484]
C90
The Recommended practice clauses are new in the C99 Standard.
C
++
There is no such macro, or requirement specified in the C
++
Standard.
Other Languages
The specification of the Java base conversions is poor.
Common Implementations
Experience with testing various translators shows that the majority don’t, at the time of this publication,
implement this Recommended Practice. The extent to which vendors will improve their implementations is
unknown.
There is a publicly available set of tests for testing binary to decimal conversions.
[1413]
Coding Guidelines
A Recommended Practice shall not be relied on to be followed by an implementation.
380
EXAMPLE 1 The following describes an artificial floating-point representation that meets the minimum
EXAMPLE
minimum floating-
point representa-
tion
requirements of this International Standard, and the appropriate values in a
<float.h>
header for type
float
:
x = s16
e
6

k=1
f
k
16
−k
, −31 ≤ e ≤ +32
FLT_RADIX 16
FLT_MANT_DIG 6
FLT_EPSILON 9.53674316E-07F
FLT_DIG 6
FLT_MIN_EXP -31
FLT_MIN 2.93873588E-39F
FLT_MIN_10_EXP -38
FLT_MAX_EXP +32
FLT_MAX 3.40282347E+38F
FLT_MAX_10_EXP +38
v 1.2 June 24, 2009
5.2.4.2.2 Characteristics of floating types <float.h>
382
Commentary
Note that this example has a FLT_RADIX of 16, not 2.
381
EXAMPLE 2 The following describes floating-point representations that also meet the requirements for
EXAMPLE
IEC 60559
floating-point
single-precision and double-precision normalized numbers in IEC 60559,
20)
and the appropriate values in a
<float.h> header for types float and double:
x
f
= s2
e
24

k=1
f
k
2
−k
, −125 ≤ e ≤ +128
x
d
= s2
e
53

k=1
f
k
2
−k
, −1021 ≤ e ≤ +1024
FLT_RADIX 2
DECIMAL_DIG 17
FLT_MANT_DIG 24
FLT_EPSILON 1.19209290E-07F // decimal constant
FLT_EPSILON 0X1P-23F // hex constant
FLT_DIG 6
FLT_MIN_EXP -125
FLT_MIN 1.17549435E-38F // decimal constant
FLT_MIN 0X1P-126F // hex constant
FLT_MIN_10_EXP -37
FLT_MAX_EXP +128
FLT_MAX 3.40282347E+38F // decimal constant
FLT_MAX 0X1.fffffeP127F // hex constant
FLT_MAX_10_EXP +38
DBL_MANT_DIG 53
DBL_EPSILON 2.2204460492503131E-16 // decimal constant
DBL_EPSILON 0X1P-52 // hex constant
DBL_DIG 15
DBL_MIN_EXP -1021
DBL_MIN 2.2250738585072014E-308 // decimal constant
DBL_MIN 0X1P-1022 // hex constant
DBL_MIN_10_EXP -307
DBL_MAX_EXP +1024
DBL_MAX 1.7976931348623157E+308 // decimal constant
DBL_MAX 0X1.fffffffffffffP1023 // hex constant
DBL_MAX_10_EXP +308
If a type wider than double were supported, then DECIMAL_DIG would be greater than 17. For example, if the
widest type were to use the minimal-width IEC 60559 double-extended format (64 bits of precision), then
DECIMAL_DIG would be 21.
Commentary
The values given here are important in that they are the most likely values to be provided by a conforming
implementation using IEC 60559, which is what the majority of modern implementations use. These values
correspond to the IEC 60559 single- and double-precision formats. This standard also defines extended
single and extended double formats, which contain more bits in the significand and greater range in the
exponent.
Note that this example gives the decimal and hexadecimal floating-constant representation for some of the
macro definitions. A real header will only contain one of these definitions.
C90
The C90 wording referred to the ANSI/IEEE-754–1985 standard.
June 24, 2009 v 1.2
6.1 Notation
384
382
20) The floating-point model in that standard sums powers of
b
from zero, so the values of the exponent limits
footnote
20
are one less than shown here.
Commentary
Fortran counts from 1, not 0 and the much of the contents of <float.h>, in C90, came from Fortran.
383
Forward references:
conditional inclusion (6.10.1), complex arithmetic
<complex.h>
(7.3), extended multibyte
and wide character utilities
<wchar.h>
(7.24), floating-point environment
<fenv.h>
(7.6), general utilities
<stdlib.h> (7.20), input/output <stdio.h> (7.19), mathematics <math.h> (7.12).
6. Language
6.1 Notation
384
In the syntax notation used in this clause, syntactic categories (nonterminals) are indicated by
italic type
,
and literal words and character set members (terminals) by bold type.
Commentary
A terminal is a token that can appear in the source code. A nonterminal is the name of a syntax rule used to
group together zero or more terminals and other nonterminals. The nonterminals can be viewed as a tree.
The root is the nonterminal translation-unit. The terminals are the leaves of this tree.
Syntax analysis is the processing of a sequence of terminals (as written in the source) via various
nonterminals until the nonterminal
translation-unit
is reached. Failure to reach this final nonterminal, or
encountering an unexpected sequence of tokens, is a violation of syntax.
The syntax notation used in the C Standard is not overly formal; it is often supported by text in the
semantics clause. The C syntax can be written in LALR(1) form. (Although some reorganization of the
productions listed in the standard is needed), assuming the
typedef
issue is fudged (the only way to know
whether an identifier is a typedef name or not is to look it up in a symbol table, which introduces a context
dependency; the alternative of syntactically treating a typedef name as an identifier requires more than one
token lookahead.) This also happens to be the class of grammars that can be processed by
yacc
and many
other parser generators.
1 A(B) /
*
Declare B to have type A, or call function A with argument B?
*
/
The syntax specified in the C Standard effectively describes four different grammars:
1.
A grammar whose start symbol is
preprocessing-token
; the input stream processed by this grammar
preprocess-
ing token
syntax
770
contains the source characters output by translation phase 2.
transla-
tion phase
2
118
2.
A grammar whose start symbol is
preprocessing-file
; the input stream processed by this grammar
preprocessor
directives
syntax
1854
contains the preprocessing-tokens output by translation phase 3.
transla-
tion phase
3
124
3.
A grammar whose start symbol is
token
; the input to this grammar is a single
preprocessing-token
.
token
syntax
770
The syntax of the characters forming the
preprocessing-token
need to form a valid parse of the
token syntax.
4.
A grammar whose start symbol is
translation-unit
; the input stream processed by this grammar
transla-
tion unit
syntax
1810
contains the tokens output by translation phase 6.
transla-
tion phase
6
135
The preprocessor-token and token syntax is sometimes known as the lexical grammar of C.
There are many factors that affect the decision of whether to specify language constructs using syntax
or English prose in a Constraints clause. The C Standard took the approach of having a relatively simple,
general syntax specification and using wording in constraints clauses to handle the special cases. There are
techniques available (e.g., two-level grammars) for specifying the requirements (including the type rules)
v 1.2 June 24, 2009

Không có nhận xét nào:

Đăng nhận xét