Skip to content

Commit 78bf064

Browse files
authored
Fix #12 - remove printable to improve footprint. (#13)
- Fix #12, breaking change. Thanks to Andyjbm for the measurements. - remove Printable interface as it makes the effective footprint larger! - remove getDecimals() and setDecimals(). - patch examples and unit test for the above. - add example **float16_sizeof_array.ino**. - add **isPosInf()** and **isNegInf()** - add link to **float16ext** class with a larger range than float16. - update readme.md. - update unit-tests.
1 parent f8319b1 commit 78bf064

File tree

17 files changed

+328
-98
lines changed

17 files changed

+328
-98
lines changed

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,19 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/)
66
and this project adheres to [Semantic Versioning](http://semver.org/).
77

88

9+
## [0.3.0] - 2024-04-17
10+
- Fix #12, breaking change. Thanks to Andyjbm for the measurements.
11+
- remove Printable interface as it makes the effective footprint larger!
12+
- remove getDecimals() and setDecimals().
13+
- patch examples and unit test for the above.
14+
- add example **float16_sizeof_array.ino**.
15+
- add **isPosInf()** and **isNegInf()**
16+
- add link to **float16ext** class with a larger range than float16.
17+
- update readme.md.
18+
- update unit-tests.
19+
20+
----
21+
922
## [0.2.0] - 2024-03-05
1023
- **warning: breaking changes!**
1124
- Fix #10, mantissa overflow

README.md

Lines changed: 103 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,68 @@ Arduino library to implement float16 data type.
1616
## Description
1717

1818
This **experimental** library defines the float16 (2 byte) data type, including conversion
19-
function to and from float32 type. It is definitely **work in progress**.
20-
21-
The library implements the **Printable** interface so one can directly print the
22-
float16 values in any stream e.g. Serial.
19+
function to and from float32 type.
2320

2421
The primary usage of the float16 data type is to efficiently store and transport
2522
a floating point number. As it uses only 2 bytes where float and double have typical
2623
4 and 8 bytes, gains can be made at the price of range and precision.
2724

25+
Note that float16 only has ~3 significant digits.
26+
27+
To print a float16, one need to convert it with toFloat(), toDouble() or toString(decimals).
28+
The latter allows concatenation and further conversion to an char array.
29+
30+
In pre 0.3.0 version the Printable interface was implemented, but it has been removed
31+
as it caused excessive memory usage when declaring arrays of float16.
32+
33+
34+
#### ARM alternative half-precision
35+
36+
-https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision
37+
38+
_ARM processors support (via a floating point control register bit)
39+
an "alternative half-precision" format, which does away with the
40+
special case for an exponent value of 31 (111112).[10] It is almost
41+
identical to the IEEE format, but there is no encoding for infinity or NaNs;
42+
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._
43+
44+
Implemented in https://github.com/RobTillaart/float16ext class.
45+
46+
47+
#### Difference with float16 and float16ext
48+
49+
The float16ext library has an extended range as it supports values from +- 65504
50+
to +- 131008.
51+
52+
The float16ext does not support INF, -INF and NAN. These values are mapped upon
53+
the largest positive, the largest negative and the largest positive number.
54+
55+
The -0 and 0 values will both exist.
56+
57+
58+
Although they share a lot of code float16 and float16ext should not be mixed.
59+
In the future these libraries might merge / derive one from the other.
60+
61+
62+
#### Breaking change 0.3.0
63+
64+
Version 0.3.0 has a breaking change. The **Printable** interface is removed as
65+
it causes larger than expected arrays of float 16 (See #16). On ESP8266 every
66+
float16 object was 8 bytes and on AVR it was 5 bytes instead of the expected 2 bytes.
67+
68+
To support printing the class added two new conversion functions:
69+
```cpp
70+
f16.toFloat();
71+
f16.toString(decimals);
72+
73+
Serial.println(f16.toFloat(), 4);
74+
Serial.println(f16.toString(4));
75+
```
76+
This keeps printing relative easy.
77+
78+
The footprint of the library is now smaller and one can now create compact array's
79+
of float16 elements using only 2 bytes per element.
80+
2881

2982
#### Breaking change 0.2.0
3083

@@ -34,26 +87,28 @@ For some specific values the mantissa overflowed when the float 16 was
3487
assigned a value to. This overflow was not detected / corrected.
3588

3689
During the analysis of this bug it became clear that the sub-normal numbers
37-
were also implemented correctly. This is fixed too in 0.2.0.
90+
were also not implemented correctly. This is fixed too in 0.2.0.
3891

39-
There is still an issue 0 versus -0
92+
There is still an issue with 0 versus -0 (sign gets lost in conversion).
4093

4194
**This makes all pre-0.2.0 version obsolete.**
4295

4396

4497
## Specifications
4598

4699

47-
| attribute | value | notes |
48-
|:----------|:-------------|:--------|
49-
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1,5,10)
50-
| sign | 1 bit |
51-
| exponent | 5 bit |
52-
| mantissa | 10 bit | ~ 3 digits
53-
| minimum | 5.96046 E−8 | smallest positive number.
54-
| | 1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
55-
| maximum | 65504 |
56-
| | |
100+
| Attribute | Value | Notes |
101+
|:------------|:----------------|:--------|
102+
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1, 5, 10)
103+
| sign | 1 bit |
104+
| exponent | 5 bit |
105+
| mantissa | 10 bit | 3 - 4 digits
106+
| minimum | ±5.96046 E−8 | smallest number.
107+
| | ±1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
108+
| maximum | ±65504 |
109+
| | |
110+
111+
± = ALT 0177
57112

58113

59114
#### Example values
@@ -87,6 +142,10 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
87142
#### Related
88143

89144
- https://wokwi.com/projects/376313228108456961 (demo of its usage)
145+
- https://github.com/RobTillaart/float16
146+
- https://github.com/RobTillaart/float16ext
147+
- https://github.com/RobTillaart/fraction
148+
- https://en.wikipedia.org/wiki/Half-precision_floating-point_format
90149

91150

92151
## Interface
@@ -97,28 +156,35 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
97156

98157
#### Constructors
99158

100-
- **float16(void)** defaults to zero.
159+
- **float16(void)** defaults value to zero.
101160
- **float16(double f)** constructor.
102161
- **float16(const float16 &f)** copy constructor.
103162

104163

105164
#### Conversion
106165

107-
- **double toDouble(void)** convert to double (or float).
166+
- **double toDouble(void)** convert value to double or float (if the same e.g. UNO).
167+
- **float toFloat(void)** convert value to float.
168+
- **String toString(unsigned int decimals = 2)** convert value to a String with decimals.
169+
Please note that the accuracy is only 3-4 digits for the whole number so use decimals
170+
with care.
171+
172+
173+
#### Export and store
174+
175+
To serialize the internal format e.g. to disk, two helper functions are available.
176+
108177
- **uint16_t getBinary()** get the 2 byte binary representation.
109178
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
110-
- **size_t printTo(Print& p) const** Printable interface.
111-
- **void setDecimals(uint8_t d)** idem, used for printTo.
112-
- **uint8_t getDecimals()** idem.
113-
114-
Note the setDecimals takes one byte per object which is not efficient for arrays of float16.
115-
See array example for efficient storage using set/getBinary() functions.
116179

117180

118181
#### Compare
119182

120-
Standard compare functions. Since 0.1.5 these are quite optimized,
121-
so it is fast to compare e.g. 2 measurements.
183+
The library implement the standard compare functions.
184+
These are optimized, so it is fast to compare 2 float16 values.
185+
186+
Note: comparison with a float or double always include a conversion.
187+
You can improve performance by converting e.g. a threshold only once before comparison.
122188

123189
- **bool operator == (const float16& f)**
124190
- **bool operator != (const float16& f)**
@@ -143,20 +209,16 @@ Not planned to optimize these.
143209
- **float16& operator \*= (const float16& f)**
144210
- **float16& operator /= (const float16& f)**
145211

146-
negation operator.
212+
Negation operator.
147213
- **float16 operator - ()** fast negation.
148214

215+
Math helpers.
149216
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
150217
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
151-
- **bool isInf()** returns true if value is (-)infinite.
152-
153-
154-
#### Experimental 0.1.8
155-
156-
- **bool isNaN()** returns true if value is not a number.
157-
158-
159-
## Notes
218+
- **bool isNaN()** returns true if value is not a number.
219+
- **bool isInf()** returns true if value is ± infinite.
220+
- **bool isPosInf()** returns true if value is + infinite.
221+
- **bool isNegInf()** returns true if value is - infinite.
160222

161223

162224
## Future
@@ -167,26 +229,19 @@ negation operator.
167229

168230
#### Should
169231

170-
- unit tests of the above.
171232
- how to handle 0 == -0 (0x0000 == 0x8000)
172-
- investigate ARM alternative half-precision
173-
_ARM processors support (via a floating point control register bit)
174-
an "alternative half-precision" format, which does away with the
175-
special case for an exponent value of 31 (111112).[10] It is almost
176-
identical to the IEEE format, but there is no encoding for infinity or NaNs;
177-
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._
178-
179233

180234
#### Could
181235

182-
- copy constructor?
183-
- update documentation.
236+
- unit tests.
184237
- error handling.
185238
- divide by zero errors.
186239
- look for optimizations.
187240
- rewrite **f16tof32()** with bit magic.
188-
- add storage example - with SD card, FRAM or EEPROM
189-
- add communication example - serial or Ethernet?
241+
- add examples
242+
- persistent storage e.g. SD card, FRAM or EEPROM.
243+
- communication e.g. Serial or Ethernet (XML, JSON)?
244+
- sorting an array of float16?
190245

191246
#### Wont
192247

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
//
2+
// FILE: float16_sizeof_array.ino
3+
// AUTHOR: Rob Tillaart
4+
// PURPOSE: test float16 size
5+
// URL: https://github.com/RobTillaart/float16
6+
// See #12
7+
8+
#include "Arduino.h"
9+
#include "float16.h"
10+
11+
12+
float16 test16[100];
13+
float test32[100];
14+
15+
void setup()
16+
{
17+
Serial.begin(115200);
18+
19+
Serial.println("FLOAT16");
20+
Serial.println(sizeof(test16) / sizeof(test16[0]));
21+
Serial.println(sizeof(test16));
22+
Serial.println(sizeof(test16[0]));
23+
Serial.println();
24+
25+
Serial.println("FLOAT32");
26+
Serial.println(sizeof(test32) / sizeof(test32[0]));
27+
Serial.println(sizeof(test32));
28+
Serial.println(sizeof(test32[0]));
29+
Serial.println();
30+
31+
// set some values to make sure the compiler doesn't optimise out the arrays.
32+
test16[5] = 32;
33+
test32[4] = 32;
34+
35+
// Serial.println(test16[5].toDouble(), 3);
36+
// Serial.println(test16[5].toFloat(), 3);
37+
// Serial.println(test16[5].toString());
38+
// Serial.println(test16[5].toString(1));
39+
// Serial.println(test16[5].toString(3));
40+
};
41+
42+
void loop()
43+
{
44+
};

examples/float16_test_all/float16_test_all.ino

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,6 @@ void setup()
2929
Serial.println(FLOAT16_LIB_VERSION);
3030
Serial.println("\nStart ");
3131

32-
f16.setDecimals(6);
33-
3432
test_1();
3533
test_2();
3634
test_3();

examples/float16_test_all_2/float16_test_all_2.ino

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@ void setup()
2424
Serial.print("FLOAT16_LIB_VERSION: ");
2525
Serial.println(FLOAT16_LIB_VERSION);
2626

27-
f16.setDecimals(6);
28-
2927
test_all();
3028

3129
Serial.println("\ndone");
@@ -96,7 +94,7 @@ void test_0()
9694
f16 = x;
9795
Serial.print(x);
9896
Serial.print("\t");
99-
Serial.print(f16);
97+
Serial.print(f16.toString(2));
10098
Serial.print("\t");
10199
Serial.print(f16.toDouble(), 2);
102100
Serial.print("\t");

examples/float16_test_array/float16_test_array.ino

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
// URL: https://github.com/RobTillaart/float16
66

77

8-
// show different storage needs
8+
// show storage needs (fixed in 0.3.0)
99

1010
#include "float16.h"
1111

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
2+
float16_test_array.ino
3+
FLOAT16_LIB_VERSION: 0.3.0
4+
5+
0 5.07
6+
1 -0.51
7+
2 -2.27
8+
3 3.58
9+
4 6.30
10+
5 -0.28
11+
6 2.44
12+
7 5.78
13+
8 6.23
14+
9 4.09
15+
0.30
16+
17+
0 5.07
18+
1 -0.51
19+
2 -2.27
20+
3 3.58
21+
4 6.30
22+
5 -0.28
23+
6 2.44
24+
7 5.78
25+
8 6.23
26+
9 4.09
27+
0.30
28+
29+
SIZE: 20
30+
SIZE: 20
31+
32+
done

examples/float16_test_performance/float16_test_performance.ino

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
//
22
// FILE: float16_test_performance.ino
33
// AUTHOR: Rob Tillaart
4-
// PURPOSE: test float16
4+
// PURPOSE: test float16 performance
55
// URL: https://github.com/RobTillaart/float16
66

77

@@ -162,7 +162,7 @@ void setup()
162162
delay(10);
163163
Serial.println();
164164

165-
Serial.println(f16);
165+
Serial.println(f16.toString(4));
166166

167167
Serial.println("MATH III - negation");
168168
start = micros();
@@ -173,7 +173,7 @@ void setup()
173173
delay(10);
174174
Serial.println();
175175

176-
Serial.println(f18);
176+
Serial.println(f18.toString(4));
177177

178178
Serial.println("\ndone");
179179
}

0 commit comments

Comments
 (0)