LaTeX area

From Ciswikidb
Jump to navigation Jump to search

General advice

Note: Some of the content on this page is based on Max's personal opinion.

Recordings of Max's writing seminars can be found here: One, Two, Three.

We commonly need to create create written articles of various formats and lengths as well as other visual material such as posters or slideshows. Most of the content gets used multiple times, yet the look and feel of the visual medium can vary wildly — ranging from phone screens over books to video projectors. Making the content look good and, ideally, reusing it in other documents involves adapting its style to the respective medium. For this reason, static content is bad.

What we need is a typesetting system that lets us create all the material (text, equations, tables, figures, everything) in one common organic format and transparently adapt its display style to the target medium. This involves reflowing the text according to the page/screen layout while making figures and other content of various information density optimally accessible to the viewer. Such a system is conceivable but presently does not exist. Layout-heavy file formats like Word / PowerPoint / etc. are broken by design and not worth considering for any purpose. Despite (or, more accurately, because of) their layout flexibility, they universally fail at producing sufficient quality for any medium. In principle, one could invent an abstract description language capable of doing everything we need, and such attempts in fact exist (e.g., extending XHTML with MathML, SVG for drawings, JavaScript-based interactive plotting like Bokeh), but the reality of it is that the client support and therefore the resulting quality of all existing document formats varies so wildly that they are effectively useless unless the range of target media is very limited. Ironically, the only format with good cross-platform support and printability is PDF, a format specifically designed to produce the exact same document on any target medium: this idea seemed fine 20 years ago but is the opposite of what is needed today and in the future.

Keeping in mind that technology is likely to evolve and these problems may all be a thing of the past 5 to 10 years from now, we have to be content with the best option available today, which is to use LaTeX for all our content and make a static PDF for each envisioned medium. While at least most computer screens are about the same aspect ratio and size today, the variety of phone screens limits what we can do with a handful of PDFs. It is, however, clear that paper and anything electronic are incompatible in terms of page layout, so a written document cannot be produced with fewer than 2 PDF files, and then screen presentations etc. are yet another matter.

Page layout

Screen or print documents

Letting a writer define their own page layout (margins etc.) almost universally leads to a terrible result because few people have a good understanding of typography<ref>. As a rule of thumb, the average number of characters per line should not be much greater than 60. With the exception of two-column layouts commonly found in journals, most pieces of scientific writing violate this rule, making it hard to read more than a page or two without getting fatigued. However, two-column layouts, while allowing for an acceptable line width for readability, cause different problems:

  • Any mismatch between page geometry and screen/paper geometry is exacerbated: If you have to zoom in to be able to read the font, you will need to scroll back to the beginning of the page to see the second column.
  • Some elements (figures, tables, equations) may need to be wider than a single column. While this can be accomplished by making them span across both columns, that makes the zooming issue even worse.
  • The lack of empty space on the page makes adding a binding correction impossible, so paper copies of these documents cannot be stapled or put in a folder.

The only way out of this dilemma is to use a single-column layout of reasonable line width and, in the case of A4/Letter paper, leave margins that look unusually large and wasteful to the eyes of most people corrupted by bad typesetting software. Such a layout has always been the default in LaTeX (e.g., documentclass article) for good reasons. The argument about saving paper by putting more text on it is completely irrelevant, and I am saying this as a massive environmentalist.

Slideshows

Virtually all talks, at conferences or elsewhere, involve a slideshow containing text that spans the full width of the projector. Don't be that person. Seeing this was already painful in the times of 4:3 screens and has been made far worse by widescreen devices. Of course, if all you are going to show is a list of items you want to talk about and there is no way to columnize it and no figure to embellish it with, you can end up putting that on the slide as is, but please refrain from full, multi-line sentences.

The type area on widescreen slides is actually a lot wider than 16:9 because most slideshow templates reserve some space at the top and bottom for the slide title and institute logos etc. But this, in itself, is not a bad thing; the natural way to use the space is to divide it into two columns, which lets you display two things side by side. I like putting an itemized list and a figure/table next to each other, but it could also be two lists or whatever fits the purpose. You may find putting a bunch of images, tables, and other stuff in random places on the slide is not as straightforward in LaTeX as it is in WYSIWYG software. This is not a flaw: it prevents you from doing something horrible. Make each slide about one thing; if you show two things on one slide, they must complement each other, not be about two totally different topics. The listener will always look at the figure that you are talking about at that very moment, so everything else on the slide is in the way and only keeps you from showing the important figure at a nice zoom level where everyone can read the axis labels. More about that later.

Consider the following examples, one bad (on purpose) and one good (also on purpose).

Bad slide.png Good slide.png


Typesetting of math, physics, and chemistry

The typographical details of mathematical content have a meaning. There is usually exactly one correct way, and anything else is not only wrong but also risks semantic distortion.

  • Read the SI brochure if you haven't. There are more binding ISO standards with basically the same content, but their numbers keep changing and some of the contents are not available for free.
  • NIST SP 811 elaborates on these standards; also a good read.
  • Here are some recommendations for how to do things right in LaTeX. While this document is from 1997, to my knowledge there is still no better universal way to solve the problem with the total differential operator.

The gist of it:

  • Units including prefixes are typeset upright (no matter if Greek or Roman, i.e., italic mu for micro is wrong) and separated from both the number and from each other with a (non-breakable) space. siunitx uses thin spaces, which tends to look better. Leaving out any of these spaces is illegal because they are a mathematical operator.
  • Anything else that isn't a quantity is also typeset upright. This includes operators like the differential d, subscripts/superscripts representing a word/name (e.g., the B in Boltzmann's constant), and functions like sin, exp, etc.
  • Certain constants that aren't considered physical quantities are typeset upright, most notably i, e, and pi. There is therefore no ambiguity between imaginary unit and current density or between the base of the natural logarithm and the elementary charge. Unless you do it wrong, which is why all this is so important.
  • Particles are typeset upright.
  • Physical/mathematical quantities represented by a single symbol are typeset italic, even if they are vectors or matrices. There should be no quantity symbols consisting of multiple glyphs. Historically, the only notable (and regrettable) exception to this is the Reynolds number. Let's not make "QE" a quantity symbol.

My LaTeX recommendations for a correct and non-tedious implementation of all this:

  • Exclusively use siunitx for any non-trivial numbers, any units, and any combination thereof. Turn off your brain and use all commands as intended. Read the manual if you haven't (and save it).
  • Use mhchem for chemical elements (manual).
  • Either \text or \mathrm can be used for roman typeface in math mode (e.g., subscripts that aren't quantities), but they treat spacing differently. Use \text when in doubt.
  • For particles, there are convenience packages, but I have never found them to actually be convenient. Writing \text{\textgamma} for the photon feels suboptimal, but it's how I currently do it.
  • If you use math mode for something that isn't math, or the other way around, chances are it's not the right way.

Fonts

Most fonts are useless for serious applications. Of the few that aren't, which one to use depends on the target medium: mostly its resolution, but also sub-pixel antialiasing capabilities etc. It is generally assumed that video projectors cannot deliver good-looking serif fonts, especially at the huge font sizes needed to reach everyone in the last row of seats. This is why beamer presentation templates tend to have very different fonts than anything meant for print or screen reading.

If you are publishing in a journal, you are bound to their template anyway. Their choices are usually not great, but there is nothing you can do.

For self-publishing, the LaTeX default font (Computer Modern or Latin Modern) looks OK on very-high-resolution print but is abysmal for on-screen reading. A good compromise that looks good on both screen and paper is utopia-otf, but the options are likely to evolve as more math-aware OpenType fonts are developed.

Figures

Figures are not disconnected entities that you import from some other software to be included as is. \includegraphics is for photos, not for schematic drawings or data plots.

Figures deserve to be an organic part of the typesetting of the document and should therefore be rendered therein. The best package to do this is TikZ. The length of the manual is scary, but it is a necessary read (at least the general parts; you'll quickly find out which parts you can skip). All other methods to generate vector graphics are deprecated. The idea is to describe the contents and visual style of the figure in an abstract language that pays fairly little attention to the physical details of the target medium. Advantages include:

  • Figures remain editable without relying on external software and corresponding input data.
  • Font typeface and size organically fit the document.
  • Adapting figures to a different page layout (e.g., aspect ratio) is trivial. This means you can copy the code of a figure from a paper to a 16:9 screen presentation without any scaling issues.
  • The achievable quality is better than anything you can do with other tools accessible to physicists.

Data visualization, which is, of course, our bread and butter, can be handled by TikZ natively, but I prefer to use pgfplots, which extends TikZ with data plotting environments that are more feature-rich. I will say pgfplots is sometimes a pain to use. But it's the best you can do, so learn it. You will need a local copy of the manual, and Google (i.e., StackOverflow) is also very much your friend as soon as you want to do something exotic.

Bibliography

biblatex with the biber backend deprecates everything that predates it, i.e., bibtex, natbib. The main reason why biber is vastly superior to bibtex is native UTF-8 support. However, lots of journals need the bibliography to be submitted in a certain way, and virtually none of them have caught up. For the time being, we sometimes have to resort to bibtex-compatible bibliography data to make journals happy.

Don't write your bibliography items yourself unless you have to. For journal articles, one can always get a machine-generated BibTeX entry from the publisher website. But be sure to check them for errors; there is at least one publisher whose entries will not render correctly (hyphen instead of en-dash as a range separator between years, incorrect capitalization in article title).

Note that the author list and the title receive special reformatting with a substantial amount of internal logic (TODO: cite the paper where this is explained). Always use the correct syntax and look at the compiler warnings if in doubt. No commas, no "et al.":

 author={A. Author and B. Author and C. Author and others},
 title={{JLab} does things at {CEBAF}}

How the items end up getting displayed in the bibliography depends on the style used, which may be mandated by the publisher. In principle, LaTeX's bibliography backends are based on separating content from style, in line with the rest of LaTeX's philosophy. You should be able to copy database items from one document to the other with no regard to the respective citation style. In practice, there are limits to this because of compatibility issues brought about by the long history. For example, when using bibtex, one is forced to use @misc more often than seems sensible just to fudge an entry that defies categorization or doesn't quite work with the available database fields. Including a URL with an unpublished article can be such a case, even though it would seem trivial for it to be supported by the @article, @techreport, or even @unpublished types:

 @misc{abc,
   author={...},
   title={...},
   note={JLab tech note TN-01-2345},
   howpublished={\url{https://......}}
 }


Regardless of the bibliography package used, to get clickable URLs in the bibliography, the hyperref package must be included. Today, a machine-readable way to make a referenced document accessible from the PDF is wildly more important than quoting all the metadata, the DOI being the most standardized and sustainable way; however, unfortunately, citations still need to be human-readable as well, if only because a paper copy will not contain any clickable links.

Other recommendations

  • Don't use plain-TeX artifacts like \def, \rm, etc. Prefer LaTeX equivalents because they are more robust and compatible.
  • Always use LuaLaTeX, not PDFLaTeX, mainly for reasons of native UTF-8 support, but also relieved memory limits etc. Ditch inputenc and fontenc; they are not needed with a UTF-8 engine (and can even cause problems). The future belongs to OpenType fonts, which can be used natively with fontspec.
  • The ancient way to encode diacritics and other special characters (e.g., \"a) should never be used. UTF-8 encoding combined with modern fonts takes care of everything. This is still true for T1 fonts if you prefer them.

Examples of common LaTeX code

The main reason to use LaTeX is to have access to crucial packages like siunitx, TikZ, and pgfplots. If you use them wrong or don't use them at all, the result will be just as catastrophic as what Word etc. can produce, and it is not surprising that lots of people feel like they are getting a bad deal, having to learn an admittedly antiquated and tedious language and still ending up with subpar results. Of course, templates and style files, if designed well, at least take care of page layout and font issues to some extent, but we also need to learn how to typeset the contents properly.

You can always ask Google how to solve a certain LaTeX (or TikZ or pgfplots) problem; it is often hard to find these solutions in the respective manuals. You will usually end up on StackExchange, where there is an abundance of good advice if you know what to ignore. But be extremely careful when looking for generic typesetting inspiration on the internet, including peer-reviewed journals. Most people do not know what they are doing, and this is reflected by the quality of published material.

As for examples of how to do our bread-and-butter tasks, instead of showing made-up examples here, I shall refer to the source code of my tech notes, talks, and posters, which contain plenty of examples of most things you need every day. I try to keep everything organized under /group/inj_group/Max/Publications (note, /group is called O: on CUE Windows boxes).

Templates

The following templates are under development and may contain material that cannot be publicly redistributed. Their visual style is supposed to comply with JLab corporate design in terms of colors, fonts, and logos.

Please ask Max Bruker for a copy of the files (or a shared Overleaf repository to clone, if preferred).

Tech note

Max's CIS tech note template (link accessible only on site) is dated 06/10/2022.

It is a single-column page layout with a wide margin that can accommodate citations, sidenotes, and small figures, similar to Tufte's books. Passing the option "screen" to the documentclass will produce a page geometry suitable for screen reading, whereas not passing it will output letter paper with appropriate margins. The type area is virtually the same, so you can get both output files from the same source very easily without worrying about float placement. Refer to the example source file for more information.

A recent tech note using this template is Two years of booster adventures at the UITF (link accessible only on site). The folder includes a zip with the source code if you are interested in the details of how the figures are made etc. Feel free to copy code as needed.

Poster (based on tikzposter)

Max's CIS poster template is dated 08/18/2022.

Example posters based on these files include WEPA12 and WEPA13 at NAPAC 2022. Feel free to ask Max for the source code.

Screen presentation (based on beamer)

Max's presentation template is also available.