Page 1 of 1

Big endian problem

Posted: 10 Dec 2016, 00:55
by jamesjer
The build of polymake 3.0r2 is failing on all big endian architectures supported by Fedora. I just tried with 3.1beta2, with no change. Here is a typical failure, from a ppc64 build:

Code: Select all

/usr/bin/perl perl/polymake --ignore-config --script generate_docs /home/fedora/rpmbuild/BUILDROOT/polymake-3.1-0.beta2.fc25.ppc64/usr/share/polymake/doc fulton polytope group graph ideal fan tropical matroid common topaz Attempt to free unreferenced scalar: SV 0x10031bdeda0, Perl interpreter: 0x100319d0010. Makefile:186: recipe for target 'release-docs' failed make: *** [release-docs] Segmentation fault (core dumped)
Some work with gdb and valgrind shows that a bad pointer is somehow getting onto the savestack. When the crash occurs, a pointer to a yy_parser has just been removed from the savestack, leaving the savestack empty. But the yy_parser is full of bogus values, suggesting that it was either freed already, or wasn't really pointing to a yy_parser at all. Here is a typical valgrind message printed when the pointer is first dereferenced:

Code: Select all

Attempt to free unreferenced scalar: SV 0x519f720, Perl interpreter: 0x4830040. ==15318== Invalid read of size 8 ==15318== at 0x4294E54: PerlIO__close (perlio.c:1356) ==15318== by 0x4294F33: Perl_PerlIO_close (perlio.c:1372) ==15318== by 0x416FC1F: Perl_parser_free (toke.c:763) ==15318== by 0x423963B: Perl_leave_scope (scope.c:1303) ==15318== by 0x423A007: Perl_pop_scope (scope.c:122) ==15318== by 0x4162523: S_parse_body (perl.c:2355) ==15318== by 0x4162523: perl_parse (perl.c:1650) ==15318== by 0x10000CCF: main (perlmain.c:114) ==15318== Address 0x108011101 is not stack'd, malloc'd or (recently) free'd
Address 0x108011101 is the value of parser->rsfp. Since valgrind didn't complain about dereferencing the pointer to the parser (0x519f720), presumably that is valid memory; i.e., it has not been freed. Probably it is not a parser, but some other kind of object, and somehow it got into savestack[0] in such a way that perl misinterprets its type when it is popped. Since savestack is touched by namespaces.xs, CPlusPlus.xxs, and Scope.xs, I have tried to find a big-endian-specific problem in those files, but so far have failed. If you have any ideas how I could debug this issue, I would be most grateful.

A most likely unrelated issue: while looking for the cause of this bug, I noticed that CPlusPlus.xxs contains a copy of struct magic_state from perl's mg.c. However, the definition in CPlusPlus.xxs does not match the definition in perl 5.24.0's mg.c. In particular, the order of the fields mgs_flags and mgs_ss_ix is reversed.

Re: Big endian problem

Posted: 12 Jan 2017, 17:09
by jamesjer
I managed to identify the exact 3 files with the problem! The issue is that, starting with perl 5.14, Perl_leave_scope() started reading the type via the any_uv field, instead of any_i32. On 64-bit big endian machines, this means that the polymake code that writes the type via any_i32 is writing to the higher order 32 bits of the field, leading to the problems I described. I will attach a patch against 3.1beta2 to fix the problem. Since perl 5.16 seems to be the minimium requirement for polymake now, I did not conditionalize on perl version.

A Fedora build with a version of this patch for polymake 3.0r2 succeeded on all architectures.

Re: Big endian problem

Posted: 12 Jan 2017, 17:11
by jamesjer
Also, I remarked on a change to struct magic_state. That structure changed in perl 5.12, 5.14, and 5.22. I will attach a patch to fix it up as well, once again ignoring perl versions less than 5.16.

Re: Big endian problem

Posted: 12 Jan 2017, 22:21
by gawrilow
Great, thanks a lot! You must have spent a plenty of time digging so deep in perl guts. The big endian systems have become kind of exotic nowadays, as this bug could stay undetected for several years...