🌒

Python: packaging data and translation files

Posted at — Apr 04, 2022

Data files in your Python package

The cleanest way I’ve found has 3 steps.

MANIFEST.in

Your MANIFEST.in must include the files you want in the package. For example, to include all .mo files:

recursive-include multisense_processes *.py *.mo
recursive-include tests *
include *.md

__init__.py

All the directories containing data files must be Python modules, even if they don’t actually contain any actual code. In other words, they must all have files called __init__.py. For instance:

your_root/
|-- pyproject.toml
|
|-- your_python_package/
|   |-- __init__.py
|   |
|   |-- img/
|   |   |-- __init__.py
|   |   |-- some_image.jpeg

pkg_resources

When installed, your Python module may or may not be decompressed. It may be installed decompressed, or as an egg file (pretty much a .zip file). So you cannot access the files directly with some hack like os.path.dirname(os.path.abspath(__file__)).

Instead, in your Python module, you must use pkg_resources. If you need the file to be extracted (for instance to use it with some C library), pkg_resources.resource_filename() is your friend.

For instance:

1
2
3
4
5
6
7
import PIL.Image
import pkg_resources

img_path = pkg_resources.resource_filename(
    'your_python_package.img', 'some_image.jpeg'
)
img = PIL.Image.open(img_path)

You can use pkg_resources.resource_filename with directories too. They will be extracted recursively.

Translations in your Python package

Some frameworks provide their own translation mechanism (Django for instance). If you’re not using one of those frameworks, then the most common way to add support for translations in your Python module is to use gettext (sudo apt install gettext). It’s a GNU tool that works with many other programming languages.

Gettext provides tools to:

Flag the strings to translate

In your code, the strings to translate must be wrapped by a specific function call. Usually, this function is called _.

It’s common to map _ to gettext.gettext. Doing so will allow to find the strings to translate in the code, but also will return their translations (if available) at runtime. If no translation is available, by default, the original string is returned.

One way to do this is:

1
2
3
4
from gettext import gettext as _

def some_func():
    print(_("some string to translate at runtime"))

Or:

1
2
3
4
5
6
import gettext

_ = gettext.gettext

def some_func():
    print(_("some string to translate at runtime"))

File structure

You must follow the same 3 steps as for any data files, as described previously. In the end, we aim at the following file structure:

your_root/
|-- pyproject.toml
|
|-- your_python_package/
|   |-- __init__.py
|   |
|   |-- i18n/
|   |   |-- __init__.py
|   |   |
|   |   |-- locales/
|   |   |   |-- __init__.py
|   |   |   |-- messages.pot
|   |   |   |
|   |   |   |-- fr/
|   |   |   |   |-- __init__.py
|   |   |   |   |
|   |   |   |   |-- LC_MESSAGES/
|   |   |   |   |   |-- __init__.py
|   |   |   |   |   |-- some_project_name.mo
|   |   |   |   |   |-- some_project_name.po

Extraction of strings to translate

Here is the black magic command:

1
2
3
4
xgettext -k_ -kN_ \
	--from-code=UTF-8 \
	-o "your_python_package/lib/i18n/locales/messages.pot' \
	$(find your_python_package -name \*.py)

It will generate a file messages.pot that contains all the strings to translate. This file can then be used to update your translation files.

Note: you can store the file messages.pot anywhere else if you want. Putting it there just makes writing a Makefile easier (see below).

Updating the strings to translate

Assuming you’re working on the french translations, your translation file would be called your_python_package/i18n/locales/fr/LC_MESSAGES/fr.po. Once compiled, it will be your_python_package/i18n/locales/fr/LC_MESSAGES/fr.mo.

If the translation file doesn’t exist yet, you must create it with:

1
2
3
msginit --no-translator -l fr \
	-i your_python_package/lib/i18n/locales/messages.pot \
	-o your_python_package/i18n/locales/fr/LC_MESSAGES/fr.po

If it does exist, you must update its strings with:

1
2
3
msgmerge -N -U \
	your_python_package/i18n/locales/fr/LC_MESSAGES/fr.po \
	your_python_package/lib/i18n/locales/messages.pot

You’re strongly advised to have a quick look at the man pages of msgmerge to make sure it matches the behaviour you want.

Updating the strings

You can simply update the .po files with any editor. A more classy way is to install a system like Weblate.

Compiling the translations

1
2
3
msgfmt \
	your_python_package/i18n/locales/fr/LC_MESSAGES/fr.po \
	-o your_python_package/i18n/locales/fr/LC_MESSAGES/fr.mo

Makefile

To keep track of all those commands, I advise you add a Makefile at the root of your project. For instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
PY_FILES := $(shell find multisense_processes -name \*.py -print)


your_python_package/i18n/locales/messages.pot: $(PY_FILES)
	which xgettext || \
		(echo "You have to install xgettext (sudo apt install gettext)" ; exit 1)
	xgettext -k_ -kN_ --from-code=UTF-8 -o $@ $^


%.po: your_python_package/i18n/locales/messages.pot
	mkdir -p $(dir $@)
	[ -e $@ ] || msginit --no-translator -l fr -i $^ -o $@
	msgmerge -N -U $@ $^


%.mo: %.po
	msgfmt $^ -o $@


l10n_extract: your_python_package/i18n/locales/fr/LC_MESSAGES/multisense_processes.po

l10n_compile: your_python_package/i18n/locales/fr/LC_MESSAGES/multisense_processes.mo

.PHONY: l10n_extract l10n_compile

This Makefile has 2 targets.

One target to extract the strings to translate and update the translation file accordingly:

1
make l10n_extract

Then you can edit the translations in the files .po.

And one target to compile the translations:

1
make l10n_compile

Loading translations

You’re advised to look at the Python|gettext documentation. You should have something akin to the following in your Python code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import gettext
import pkg_resources

locales_path = pkg_resources.resource_filename(
    'your_python_package.i18n', 'locales'
)
assert(os.path.exists(locales_path))

gettext.translation(
    'base', localedir=locales_path, languages=['fr']
).install()