LaTeX area

From Ciswikidb
Revision as of 11:26, 10 June 2022 by Bruker (talk | contribs)
Jump to navigation Jump to search

General advice

Note: Some of the content on this page is based on Max's personal opinion.

We commonly need to create create written articles of various formats and lengths as well as other visual material such as posters or slideshows. Most of the content gets used multiple times, yet the look and feel of the visual medium can vary wildly — ranging from phone screens over books to video projectors. Making the content look good and, ideally, reusing it in other documents involves adapting its style to the respective medium. For this reason, static content is bad.

What we need is a typesetting system that lets us create all the material (text, equations, tables, figures, everything) in one common organic format and transparently adapt its display style to the target medium. This involves reflowing the text according to the page/screen layout while making figures and other content of various information density optimally accessible to the viewer. Such a system is conceivable but presently does not exist. Layout-heavy file formats like Word / PowerPoint / etc. are broken by design and not worth considering for any purpose. Despite (or, more accurately, because of) their layout flexibility, they universally fail at producing sufficient quality for any medium. In principle, one could invent an abstract description language capable of doing everything we need, and such attempts in fact exist (e.g., extending XHTML with MathML, SVG for drawings, JavaScript-based interactive plotting like Bokeh), but the reality of it is that the client support and therefore the resulting quality of all existing document formats varies so wildly that they are effectively useless unless the range of target media is very limited. Ironically, the only format with good cross-platform support and printability is PDF, a format specifically designed to produce the exact same document on any target medium: this idea seemed fine 20 years ago but is the opposite of what is needed today and in the future.

Keeping in mind that technology is likely to evolve and these problems may all be a thing of the past 5 to 10 years from now, we have to be content with the best option available today, which is to use LaTeX for all our content and make a static PDF for each envisioned medium. While at least most computer screens are about the same aspect ratio and size today, the variety of phone screens limits what we can do with a handful of PDFs. It is, however, clear that paper and anything electronic are incompatible in terms of page layout, so a written document cannot be produced with fewer than 2 PDF files, and then screen presentations etc. are yet another matter.

Page layout

Letting a writer define their own page layout (margins etc.) almost universally leads to a terrible result because few people have a good understanding of typography<ref> As a rule of thumb, the average number of characters per line should not be much greater than 60. With the exception of two-column layouts commonly found in journals, most pieces of scientific writing violate this rule, making it hard to read more than a page or two without getting fatigued. However, two-column layouts, while allowing for an acceptable line width for readability, cause different problems:

  • Any mismatch between page geometry and screen/paper geometry is exacerbated: If you have to zoom in to be able to read the font, you will need to scroll back to the beginning of the page to see the second column.
  • Some elements (figures, tables, equations) may need to be wider than a single column. While this can be accomplished by making them span across both columns, that makes the zooming issue even worse.
  • The lack of empty space on the page makes adding a binding correction impossible, so paper copies of these documents cannot be stapled or put in a folder.

The only way out of this dilemma is to use a single-column layout of reasonable line width and, in the case of A4/Letter paper, leave margins that look unusually large and wasteful to the eyes of most people corrupted by bad typesetting software. Such a layout has always been the default in LaTeX (e.g., documentclass article) for good reasons. The argument about saving paper by putting more text on it is completely irrelevant, and I am saying this as a massive environmentalist.

Typesettings of math, physics, and chemistry

The typographical details of mathematical content have a meaning. There is usually exactly one correct way, and anything else is not only wrong but also risks semantic distortion.

  • Read the SI brochure if you haven't. There are more binding ISO standards with basically the same content, but their numbers keep changing and some of the contents are not available for free.
  • NIST SP 811 elaborates on these standards; also a good read.
  • Here are some recommendations for how to do things right in LaTeX. While this document is from 1997, to my knowledge there is still no better universal way to solve the problem with the total differential operator.

The gist of it:

  • Units including prefixes are typeset upright (no matter if Greek or Roman, i.e., italic mu for micro is wrong) and separated from both the number and from each other with a (non-breakable) space. siunitx uses thin spaces, which tends to look better. Leaving out any of these spaces is illegal because they are a mathematical operator.
  • Anything else that isn't a quantity is also typeset upright. This includes operators like the differential d, subscripts/superscripts representing a word/name (e.g., the B in Boltzmann's constant), and functions like sin, exp, etc.
  • Certain constants that aren't considered physical quantities are typeset upright, most notably i, e, and pi. There is therefore no ambiguity between imaginary unit and current density or between the base of the natural logarithm and the elementary charge. Unless you do it wrong, which is why all this is so important.
  • Particles are typeset upright.

My LaTeX recommendations for a correct and non-tedious implementation of all this:

  • Exclusively use siunitx for any non-trivial numbers, any units, and any combination thereof. Turn off your brain and use all commands as intended. Read the manual if you haven't (and save it).
  • Use mhchem for chemical elements (manual).
  • Either \text or \mathrm can be used for roman typeface in math mode (e.g., subscripts that aren't quantities), but they treat spacing differently. Use \text when in doubt.
  • For particles, there are convenience packages, but I have never found them to actually be convenient. Writing \text{\textgamma} for the photon feels suboptimal, but it's how I currently do it.
  • If you use math mode for something that isn't math, or the other way around, chances are it's not the right way.

Fonts

Most fonts are useless for serious applications. Of the few that aren't, which one to use depends on the target medium: mostly its resolution, but also sub-pixel antialiasing capabilities etc. It is generally assumed that video projectors cannot deliver good-looking serif fonts, especially at the huge font sizes needed to reach everyone in the last row of seats. This is why beamer presentation templates tend to have very different fonts than anything meant for print or screen reading.

If you are publishing in a journal, you are bound to their template anyway. Their choices are usually not great, but there is nothing you can do.

For self-publishing, the LaTeX default font (Computer Modern or Latin Modern) looks OK on very-high-resolution print but is abysmal for on-screen reading. A good compromise that looks good on both screen and paper is utopia-otf, but the options are likely to evolve as more math-aware OpenType fonts are developed.

Figures

Figures are not disconnected entities that you import from some other software to be included as is. \includegraphics is for photos, not for schematic drawings or data plots.

Figures deserve to be an organic part of the typesetting of the document and should therefore be rendered therein. The best package to do this is TikZ. The length of the manual is scary, but it is a necessary read (at least the general parts; you'll quickly find out which parts you can skip). All other methods to generate vector graphics are deprecated. The idea is to describe the contents and visual style of the figure in an abstract language that pays fairly little attention to the physical details of the target medium. Advantages include:

  • Figures remain editable without relying on external software and corresponding input data.
  • Font typeface and size organically fit the document.
  • Adapting figures to a different page layout (e.g., aspect ratio) is trivial. This means you can copy the code of a figure from a paper to a 16:9 screen presentation without any scaling issues.
  • The achievable quality is better than anything you can do with other tools accessible to physicists.

Data visualization, which is, of course, our bread and butter, can be handled by TikZ natively, but I prefer to use pgfplots, which extends TikZ with data plotting environments that are more feature-rich. I will say pgfplots is sometimes a pain to use. But it's the best you can do, so learn it. You will need a local copy of the manual, and Google (i.e., StackOverflow) is also very much your friend as soon as you want to do something exotic.

Bibliography

biblatex with the biber backend deprecates everything that predates it, i.e., bibtex, natbib. The main reason why biber is vastly superior to bibtex is native UTF-8 support. However, lots of journals need the bibliography to be submitted in a certain way, and virtually none of them have caught up. For the time being, we sometimes have to resort to bibtex-compatible bibliography data to make journals happy.

Other recommendations

  • Don't use plain-TeX artifacts like \def, \rm, etc. Prefer LaTeX equivalents because they are more robust and compatible.
  • Always use LuaLaTeX, not PDFLaTeX, mainly for reasons of native UTF-8 support, but also relieved memory limits etc. Ditch inputenc and fontenc; they are not needed with a UTF-8 engine (and can even cause problems).

The future belongs to OpenType fonts, which can be used natively with fontspec.

  • The ancient way to encode diacritics and other special characters (e.g., \"a) should never be used. UTF-8 encoding combined with modern fonts takes care of everything. This is still true for T1 fonts if you prefer them.


CIS Tech Note template

The current version of the CIS tech note template is dated 06/10/2022. Please ask Max Bruker for a copy.

JLab Beamer template

A Beamer template conforming to JLab corporate identity standards is in development. Please ask Max Bruker for a copy.