ActiveState Community

a missing 0xfe

Posted by umb365 on 2009-03-10 08:11

Hi, all.
I found sth. interesting.
the big-endian & little-endian text file begin with \xfe\xff or \xff\xfe.
when I want to generate these files, it doesn't work properly.

# demo TCL code ############################################
# big endian
set dstFile [ open feff.txt w ]
fconfigure $dstFile -encoding binary
puts -nonewline $dstFile [ encoding convertfrom \xfe\xff ] ;# correct!
puts -nonewline $dstFile [ encoding convertfrom \x00\x30 ]
close $dstFile

# little endian
set dstFile [ open fffe.txt w ]
fconfigure $dstFile -encoding binary
puts -nonewline $dstFile [ encoding convertfrom \xff\xfe ] ;# error!
puts -nonewline $dstFile [ encoding convertfrom \x30\x00 ]
close $dstFile

# end code ############################################

in feff.txt, with hex mode:
FE FF 00 30

in fffe.txt, with hex mode:
FF 30 00

the 0xfe is missing!
why?

jeffh | Tue, 2009-03-10 10:46

I cannot reproduce this on either Windows or Linux. Data output is:

Z:\tmp>od -x feff.txt
0000000 fffe 3000
0000004

Z:\tmp>od -x fffe.txt
0000000 feff 0030
0000004

umb365 | Tue, 2009-03-10 18:30

but I try this code in some PC with winXP pro.
all the result file is "FF 30 00" in fffe.txt.
Do you use winXP pro now?
my tclversion is 8.5.

jeffh | Tue, 2009-03-10 18:51

The machine is Vista 64-bit, but the build is standard 32-bit ActiveTcl 8.5.6. I don't see how it could be system specific. I am using msys's 'od' command as well. How are you viewing the hex?

umb365 | Tue, 2009-03-10 23:05

I view the text file with Ultraedit 10.00.

I did more test.
In win2000 Pro + ActiveTCL 8.4, the result is same.

BUT!
if I changed the code like this:
puts -nonewline $dstFile [ encoding convertfrom \xff\xfe\x30\x00 ]
the result is correct:
00000000h: FF FE 30 00

So, I guess:
1, the \xfe should not be the last char of the string.
2, this bug maybe just appears in some non-English windows OS (I use Simplified Chinese copy).

I hope somebody who uses a non-English windows, doing a favour, test the above-mentioned code.

patthoyts | Thu, 2009-03-12 07:03

The bug as raised is nonsense. Given that you want the bytes \xff\xfe as the first two bytes of the file you cannot convert them from the system encoding. So where you have:
puts -nonewline $dsfFile [encoding convertfrom \xff\xfe]
you should just do

puts -nonewline $dstFile "\xff\xfe"
puts -nonewline $dstFile [encoding convertfrom $moredata]

However a better solutions is to do:

set f [open example.txt w]
fconfigure $f -encoding binary
puts -nonewline $f "\xff\xfe"
fconfigure $f -encoding unicode
puts $f "more data that will become unicode as it is written"

The [encoding convertfrom] command will convert from the system encoding into Tcl's internal utf-8 representation. It is not likely you want to write such utf-8 into the file so you need [encoding convertto unicode] to convert such data into proper unicode. To write big endian unicode you should then byteswap this and emit on a binary channel. For little endian the above example is sufficient.

umb365 | Thu, 2009-03-12 09:18

Well Well Well! It's Pat Thoyts. :-D
I think I get your point.
"encoding convertfrom" can't mapping the \xfe to the correct char, right?

but, why [ encoding convertfrom \xff\xfe\x30\x00 ] gets the correct string stream.

I don't want to challenge you. just a little curious. :-P

patthoyts | Thu, 2009-03-12 11:19

Luck. When I checked this I used a Russian system. When you use [encoding convertfrom $string] it uses the system encoding. On your system that happens to convert ok. On a Russian Windows XP the system is cp1251 and we get

% binary encode hex [encoding convertfrom cp1251 \xff\xfe\x30\x00]
4f4e3000

umb365 | Mon, 2009-03-16 17:49

Thanks, jeffh & patthoyts.
I think I understand most things you mentioned.
But I should think it over, maybe test more.
I will post new comment if any question is found.
Thanks again. :-)