Some latex2html patches

Daniel Clemente Laboreo

When I was writing my first LaTeX document (at 2004, and about natural deduction) I found several bugs in latex2html, which I managed to fix after a big struggle with Lyx, TeX and Perl.

These are two major bugs which now (Aug2005) have been fixed: new versions of latex2html and Mozilla/Firefox won't have these problems.

Uncomplete tables (only the header is present)

Some tables fail to render correctly if they are coming from a LyX document. It's because a command substitution (done with providecommand) is not taking place. Try it with this Tex example.

The fix was posted to the latex2html mailing list a long time ago, but was somehow ignored. Ross Moore confirmed that the patch will be included in the next version of l2h.

This one-line patch can be applied to 1.70 (manually if you want) to solve it.

  --- /usr/bin/latex2html 2004-08-27 15:30:15.000000000 +0200
  +++ latex2html  2005-03-06 03:01:56.000000000 +0100
  @@ -5276,6 +5276,12 @@ sub substitute_meta_cmds {
              elsif ($this_cmd) { push(@pieces, $this_cmd) }
            }
            push(@pieces, $after);
  +
  +          # added by DCL (patch from Ross Moore).
  +         # See http://www.tug.org/pipermail/latex2html/2004-February/002640.html
  +         # 
  +         # after the first segment we should no longer be in the preamble.
  +         $within_preamble = 0;
          }
          print " $replacements new-command replacements\n"
              if (($VERBOSITY>1) && $replacements);
  

Large margin on the bottom of some images

Some inline formulas were created like this one: (I have added the black border). I thought latex2html hadn't trimmed the bottom margin, but the padding is indeed needed here.

The problem:
Some letters have descendents: they fall a bit down the baseline of the text, like g, p, y, or Q in the above formula. If we cut all images at the lowest pixel of the letters, their baselines would not be aligned.
What does latex2html do:
if if doesn't have a descendent, use ALIGN = "BOTTOM" on that IMG. That makes the bottom of the
image align with the baseline of the text.
if it has descendent, add padding at the bottom so that the baseline of that formula is exactly at
the center of the image. Then use ALIGN = "MIDDLE", which makes the middle of the image align with the baseline of the surrounding text, according to the HTML specification (latex2html uses HTML 3.2).

This simple TeX document can be used to check the two cases. See also this post.

The real problem:
Internet Explorer broke the standards agains and decided that ALIGN = "MIDDLE" meant to align the middle of the image with the middle of the line, even when that concept is very vague in HTML (you can have a line which uses several font sizes; and resizing the window can bring words from the next line that can suddenly change the middle of the line). Unfortunately, other browsers copied its behaviour. While Konqueror shows it right (see screenshot), my Mozilla 20050326 gets it wrong.
The solution:
Obviously, fix the browsers. Mozilla bug 192077 was fixed on 30th August 2005. New versions of Mozilla and Firefox (1.5 and later) will not show this problem. Dillo 0.8.3, Links 2.1pre3 and Amaya 9.1 don't have HTML ALIGN implemented. Konqueror and Opera do it the right way. And IE corrected it in some version (though it was the one who corrupted the others!). I used this testcase and did some alignment tests.
What can be done in the meantime:
Wait for the new versions of Mozilla. Don't lose time trying to find a workaround, I already did this, and it's useless: nothing can compare to HTML ALIGN=MIDDLE. I have thought of several ideas; neither of them is perfect. If you find more, discuss them at the Latex2HTML mailing list (I will be glad to write it here, too).

1. Using the padded images, apply CSS: vertical-align

CSS selectors allow you to apply a style only to IMG which have ALIGN="MIDDLE": a CSS rule like img[align="middle"] {vertical-align: -80%} will lower the padded images (thus overriding the HTML). See the CSS for vertical-align.

However, I haven't found any two combinations (for ALIGN="MIDDLE" and ALIGN="BOTTOM" images) that aligns their baselines for all font sizes. Neither percentages work; -80% and -3% were perfect for my default font size, but don't increase it or it will get worse.

And anyway, the white padding would make the next line scroll down a little, so the result is ugly. A CSS line like P { line-height : 18pt } (or with ex) can help; but browsers which don't understand CSS or HTML will show strange things, like dillo. They should fix that and implement HTML ALIGN, but: can't we remove paddings and align everything only with CSS? :

2. Remove the white padding, align with CSS: vertical-align

After a lot of work, I found the latex2html code that does the padding. This hack renders them like this: instead of this: .

  --- /usr/bin/latex2html 2004-08-27 15:30:15.000000000 +0200
  +++ latex2html  2005-03-06 03:01:56.000000000 +0100
  @@ -6939,8 +6945,12 @@ sub make_latex{
       "\\newcommand\\lthtmlinlineB[1]{\\lthtmlmathtype{#1}\\egroup\\lthtmlhboxmathA}%\n" .
       "\\newcommand\\lthtmlinlineZ{\\egroup\\expandafter\\ifdim\\dp\\sizebox>0pt %\n" .
       "  \\expandafter\\centerinlinemath\\fi\\lthtmllogmath\\lthtmlsetinline}\n" .
  -    "\\newcommand\\lthtmlinlinemathZ{\\egroup\\expandafter\\ifdim\\dp\\sizebox>0pt %\n" .
  -    "  \\expandafter\\centerinlinemath\\fi\\lthtmllogmath\\lthtmlsetmath}\n" .
  +
  +    "\\newcommand\\lthtmlinlinemathZ{\\egroup\\expandafter\n" .
  +    "% DCL: If active, some equations have white borders at the bottom\n" .
  +    "% \\ifdim\\dp\\sizebox>0pt \\expandafter\\centerinlinemath\\fi \n" .
  +    "\\lthtmllogmath\\lthtmlsetmath}\n" .
  +
       "\\newcommand\\lthtmlindisplaymathZ{\\egroup %\n" .
       "  \\centerinlinemath\\lthtmllogmath\\lthtmlsetmath}\n" .
       "\\def\\lthtmlsetinline{\\hbox{\\vrule width.1em \\vtop{\\vbox{%\n" .
  

Having removed paddings, I found these CSS rules to be rather useful:

  /* img[align="bottom"] { vertical-align: baseline !important; } */ 
  img[align="middle"] { vertical-align: middle !important; }

And the images don't shift vertically when the height of a line varies! (only when you change font size, of course). Oh, and sorry, but for very big fonts the result will be ugly, as we're mixing big letters with images of tiny formulae.

Notice that if you test the second rule in Mozilla, you won't see the difference, because they're treating ALIGN="MIDDLE" as if it were vertical-align:middle. Update: bug 192077 fixed that.

Also know that Amaya had some problems with styles in HEAD. In fact, it failed most CSS1 tests (really disappointing...). I submitted the bug report.

But now some formula needs a little padding; the current solution is not perfect. So, I think that a solution could be: make latex2html add the required padding to be able to center all images-with-descendent using CSS vertical-align: middle; instead of HTML ALIGN="MIDDLE". If that's possible, it would work in all browsers.

3. Add padding to all images

Wouldn't be easier if all the images had that padding, as if they had a descendent? Then the browsers wouldn treat equally all images.

The problem is that not all formulas have the same depth (level of descendents). For example, try this testcase which has formulas like x (no descendent), y (descendent), subscripts and big fractions. They should all align nicely.

Computing the height of the tallest inline image in the page and making all images that tall seems not feasible; this would create large images which would show their white padding in some browsers.

So, this wouldn't work.

4. A different CSS rule for each image

Each image can be tagged with an ID and have a CSS rule, like these:

  IMG[ID="img40"]		{ vertical-align: -17px !important ;  }
  IMG[ID="img35"]		{ vertical-align: -28px !important ;  }
  IMG[ID="img31"]		{ vertical-align: -17px !important ;  }
  IMG[ID="img6"]		{ vertical-align: -16px !important ;  }
  IMG[ID="img9"]		{ vertical-align: -25px !important ;  }
  .....

This requires that latex2html updates the .css file after each run, and it's slower than having a general rule. But it works. Each rule is meant to lower the image so its vertical midpoint is aligned with the baseline of the text. The number is half the height of each image, and is measured in pixels, relative to the baseline of the text. Read the conclusions (further down) to see why this is probably the best solution (apart from HTML ALIGN="MIDDLE").

The problem of having a .css file to update can be solved by using the style attribute from HTML instead. But that's an ugly hack, since it strongly mixes presentation rules with content. Also, HTML 3.2 (the version used by LaTeX2HTML) doesn't support the style attribute. Oops... and id is also unsupported, so both CSS solutions would require a change to a more complex HTML.

Also, Andrea Censi created a code which used this concept but used margin-bottom from CSS, and a temporary blue point to track the baseline of each image. It's at: rendering LaTeX

Conclusions


That's all for now.


Back to the Linux things section.

October 2005 to 09-12-2006, Daniel Clemente Laboreo, e-mail at n142857--at-g-m-a-i-l--dot-com. All codes are free (any license).