Creating Unicode UTF-16LE files in Windows

Posted by huw_walters on 2006-10-03 16:27

Hi, I'm trying to create UTF-16LE text files in Windows (files exported by Registry Editor are unsorted, so I'm writing a script to sort them in-place, making them easier to glance through).

I'm using an excellent CPAN module File::BOM. The defuse function reads BOM [FF FE] from the input file and returns "UTF-16LE". This is passed back to the module as ">:encoding(UTF-16LE)" when we recreate the file, so it recreates the BOM on the first data write.

This prints "AB" as [41 00 42 00] as expected. However, it prints "\n" as [0D 0A 00] instead of [0D 00 0A 00], so the output file is not valid Unicode. Anyone have any ideas?!

use File::BOM qw(defuse);
...
open $in, "<", $path;
$enc = defuse($in);
...
open $out, ">:encoding($enc):via(File::BOM)", $path;
print $out "AB\n";
print $out "CD\n";
[FF FE] BOM
[41 00] A
[42 00] B
[0D 0A 00] OOPS
[43 00] C
[44 00] D
[0D 0A 00] OOPS

kevinw
ActiveState Staff
Fri, 2006-10-13 13:28

In addition to these forums, there are also a set of mailing lists on various Perl topics that can be good references and places to get information on how to use various Perl modules. You may wish to consider checking out the Perl-Win32-Users mailing list for this question -- chances are someone there has done this before and may have some good information for you.

You can find out more about the mailing lists and subscribe to them at:

http://aspn.activestate.com/ASPN/Mail/

Cheers,

kjw

huw_walters | Sun, 2006-10-15 11:36

Hi, thanks for the suggestion. I'll certainly try that, but I've now managed to narrow the problem down to this minimal example:

use strict;
my $path = shift or die "Expected filename\n";
open my $out, ">", $path or die "$!: $path\n";
binmode $out, ":encoding(UTF-16LE)";
print $out "AB\n";
print $out "CD\n";

So it isn't actually a problem in File::BOM, rather in the core functionality.

huw_walters | Sun, 2006-10-15 15:18

Yes, that answered my question thanks. I just needed to add a couple of extra disciplines on the output file, ">:raw:encoding(UTF-16LE):crlf:via(File::BOM)" instead of ">:encoding(UTF-16LE):via(File::BOM)".