Setting up a new Python project

December 15, 2023
17 min read
I thought it was a good idea to start writing about programming by giving some hints on creating a new Python project. Like most things, this may be done in several different ways. I'll describe my usual approach to managing Python versions, virtual environments, and project dependencies, and then there will be a few words on some tools I use and how those tools can be configured.

I use Mac OS day-to-day, and the things described here may vary for different operating systems, especially on Windows (it should be fairly similar on Linux).


All links provided throughout the post are gathered at the end of the post, in the section Appendix A: Useful resources.

TL;DR

I use pyenv for managing Python versions and Poetry for managing project dependencies. You can configure (probably) all tools you use in the project (like black, ruff, pytest, or coverage) in a single configuration file called pyproject.toml (it's not Poetry-specific but it's also used by Poetry).

Usually, with Python projects, I also use Docker for easier building, testing and, deployment. I wanted to describe it in this post but I decided that it would be better to cover that part in another one. Stay tuned!

Before we start

There is one very important thing that should be emphasized before we jump into details:

Do not use Python 2 unless it's absolutely necessary.

Python 2 is no longer supported. The most recent 2.x version was released back in April 2020 (!) and that was most likely the last one, which means even the security issues won't be fixed (see the official documentation for more details).

Because of that (and some other reasons too!), you should never use Python 2 for new projects, unless you need to use a library that does not support Python 3 (although in that case, I would probably consider replacing that library). Sometimes, in some legacy projects, it can't be avoided, but even then you should try upgrading to Python 3 whenever it's possible. Migration from Python 2 to Python 3 is also described in the official documentation so if you need help with migrating your project, this is where you should start looking for help.

Okay, now we can move on to the fun parts.

Managing Python versions

When we start a new Python project, we need to use Python in some specific version. Usually, it's a good idea to use the most recent release, unless there are reasons not to (eg. some library you need does not support the most recent version). You also may want to keep more than one version installed so we can switch between them which tends to be quite useful when working on multiple projects.

Python installed in the OS

Depending on the OS you use, some version of Python may already be installed in your system. You can check that by opening the terminal and running python or python3 commands.

Whether this the case or not, it's not that important. It's a good habit to stay away from the Python installed in the OS. To name a couple of reasons: installing some additional libraries may require root (or admin) privileges, and if you break it (and it's not impossible), something else will probably stop working as well.

Installing different Python versions in the system is also annoying and may cause issues. Every Python 3.x.x may be started using the python3 command, and having multiple versions means that the first one in your PATH will be used. You could be more explicit about it and run python3.9 or python3.11 to start a specific version, but it's not very convenient either, and the problem with installing libraries with admin privileges persists.

And yes, there may be some workarounds for that as well, I'm not saying it's not possible. What I am saying though is that usually, there's no reason to use those workarounds. It may be necessary in some cases but I won't dive into that in this post.

Python versions management

Some tools let you use different Python versions for different projects. In this post, I'll focus on pyenv because I like it the most. I have also used Anaconda before, so if you don't like pyenv then you may want to check that too! There may be some other tools for that but I haven't used them.

Anyway, no matter what tool you choose, the idea will be probably similar. You can create multiple virtual environments, and the Python version in each of them can be different. There are some advantages to that approach: you won't need admin (or root) permissions to install anything as the venvs will be created in your home (or project) directory, and you won't depend on your OS to provide you with the Python version you need. Also, it's very easy to create new environments and remove the old, redundant ones. In some cases, you can even reuse the same environment for multiple projects - it may sound weird but if you do a lot of prototyping with Jupyter, numpy, pandas and matplotlib (or similar libraries) then you can probably install them once and reuse the same environment for multiple prototypes.

When I say "virtual environment" (or "venv"), I essentially mean a separate directory with the Python installation and all installed packages. I'm gonna use that term a lot in this post so we must be on the same page. It's probably not the best definition but in the end, that's more or less what it is.

One more thing. You may have heard about virtual environments before. In fact, it's something that comes out of the box with every Python installation. The difference is that you're not able to change the Python version when you use that one. So it solves one issue - installing dependencies won't require elevated permissions. However, managing multiple Python versions is not something we can do using that tool on its own.

Using pyenv

As mentioned above, pyenv is one of the tools that can be used to manage Python versions and virtual environments. I'm not gonna describe the pyenv installation here, and I assume you already installed it in your system. To learn more, please take a look at the pyenv GitHub repository.

Installing Python versions

First, we need to download the Python version we want to use. To do that, just run pyenv install VERSION, for example:

pyenv install 3.11.7

# You'll see some more output here...

Installed Python-3.11.7 to /Users/user/.pyenv/versions/3.11.7

It may be the case that pyenv won't find the version you want to download. In such a case, it will display a command that you can run to fix that.

That output may be slightly different in other operating systems but the bottom line is that we now have a Python version that we can use in our projects. You can display the list of all downloaded versions with the following command:

pyenv versions
* system (set by /Users/user/.pyenv/version)
  3.11.7

Creating virtual environments

Now, we can create a virtual environment using the downloaded Python version. We can do that directly with pyenv or using other tools. I'll focus on pyenv now, and then I'll show you an alternative approach.

So, to create a new venv using pyenv you can run pyenv virtualenv VERSION NAME, for example:

pyenv virtualenv 3.11.7 sample-venv

You can activate that environment using pyenv activate sample-venv. Now, you can run python to run your scripts and pip to install the required dependencies. All of that will only affect the virtual environment, and you won't need root permissions for it.

Run pyenv deactivate to deactivate the environment, and pyenv virtualenv-delete sample-venv to delete it.

Managing project dependencies

Using pip

Technically, you could use pip to manage the dependencies in your projects, and you don't need any additional tools. But, from my experience, it's not the best idea.

pip might be a good option for very small projects, with a small number of dependencies. You could list all required packages in the requirements.txt file (the name can be different but it's a convention to use this one), and then you can run pip install -r requirements.txt to install all of them.

As your project grows, the number of dependencies will grow as well. That means you'll need to update you're requirements file manually, and that includes finding compatible versions of all dependencies. Most libraries have some dependencies too, and you need to make sure that they are all compatible. You'll also need to update the file manually if you want to upgrade or remove packages.

Defining the version of each package is not always enough. To make sure that we always use exactly the same packages, we usually need a lock file that contains more detailed information about each installed package, including a checksum that identifies each package very explicitly. pip does not offer that mechanism. We could use pip freeze but it won't be sufficient enough. And those small issues with dependencies may grow over time and you may find yourself not being able to build your project.

You may also want to split your dependencies into two or more groups. I think the most popular choice is to define dependencies and dev dependencies. The difference is that the dev dependencies should only be installed while working on the project, but not for production. A good example could be some testing tools (like pytest) or linters (eg. pylint, ruff, flake8). Usually, you don't need any of those in your production builds. While using pip, you'd need two (or more) files with dependencies to achieve that (eg. requirements.txt and requirements.dev.txt) as it's not possible to group the dependencies in a single file. The dependencies in all files need to be maintained as described above.

Again, I'm not saying those issues can't be solved, and again, there are ways to avoid them without using weird workarounds.

Using Poetry

Probably the best tool for that (that I know) is called Poetry. Managing dependencies is one of its features, and it does it very well.

From now on, I'll assume that you already installed both pyenv and Poetry in your system and that you installed at least one Python version with pyenv. You can also create a venv with pyenv but it's not necessary.

While writing this, I was using Poetry 1.3.0. The commands shown below may differ for other versions.

Creating a new Poetry project is a very simple process that you can initiate with the command poetry init. You'll be asked to provide the project name, author, Python version that you want to use, and dependencies (including dev dependencies). Just remember to keep the Python version consistent with the one you downloaded (or download another one if necessary). For example, in the previous section, we downloaded version 3.11.7, so you could use ^3.11 as the Python version in your project.

This is what a sample output of that command may look like:

poetry init
This command will guide you through creating your pyproject.toml config.

Package name [sample-project]:
Version [0.1.0]:
Description []:
Author [Sample User <user@example.com>, n to skip]:
License []:  MIT
Compatible Python versions [^3.11]:  ^3.11

Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file

[tool.poetry]
name = "sample-project"
version = "0.1.0"
description = ""
authors = ["Sample User <user@example.com>"]
license = "MIT"
readme = "README.md"
packages = [{include = "sample_project"}]

[tool.poetry.dependencies]
python = "^3.11"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"


Do you confirm generation? (yes/no) [yes]

Once this is done, a new file - pyproject.toml - is created. It holds the configuration for the project and optionally, tools used in the project (eg. pytest, ruff, black etc). Managing the dependencies in a Poetry project can be done using poetry add and poetry remove commands. Before we can do that, we need a virtual environment to begin with.

We can either use an existing Python environment (for example the one we created before), or we can choose a Python version that we want to use and create a new environment for our project only. Which approach is better? That depends. We'll discuss both options so you can make your own choice.

Before we get there, let's talk about Poetry and the structure of our project.

Poetry project structure

Managing project dependencies is just one of the features that Poetry provides. It also makes it very easy to package the project and publish it in a defined repository. And it's worth thinking about the structure of your project to make it as easy to maintain as possible.

When it comes to the source code, you may or may not put it in the src directory. I usually do that, it helps to keep the structure of all projects very similar. In that case, you probably need to update this line in your configuration file:

- packages = [{include = "sample_project"}]
+ packages = [{include = "sample_project", from = "src" }]

You'll also need to create README.md, otherwise, Poetry will fail to install the project. You may also remove that line from the configuration file if you don't want to do that. I'll assume we want to have it though (it's usually a good idea to describe your project in it, at least briefly).

In this scenario, the structure of your code may look like this:

├── README.md
├── pyproject.toml
└── src
    └── sample_project
        └── __init__.py

Now, a few words about tests. You can either create tests directly in your packages, or you can keep them in a separate directory. So, given some sample packages and modules, the structure of your project may grow into one of the following:

# Option 1 - Storing the tests in the packages

├── README.md
├── poetry.lock
├── pyproject.toml
└── src
    └── sample_project
        ├── __init__.py
        ├── another_package
        │   ├── __init__.py
        │   ├── another_module.py
        │   └── tests
        │       ├── __init__.py
        │       └── test_another_module.py
        └── sample_package
            ├── __init__.py
            ├── sample_module.py
            └── tests
                ├── __init__.py
                └── test_sample_module.py


# Option 2 - Storing the tests in a separate directory

├── README.md
├── poetry.lock
├── pyproject.toml
├── src
│   └── sample_project
│       ├── __init__.py
│       ├── another_package
│       │   ├── __init__.py
│       │   └── another_module.py
│       └── sample_package
│           ├── __init__.py
│           └── sample_module.py
└── tests
    ├── __init__.py
    ├── another_package
    │   ├── __init__.py
    │   └── test_another_module.py
    └── sample_package
        ├── __init__.py
        └── test_sample_module.py

If you use pytest, the second approach may require adding the following lines to the configuration file:

[tool.pytest.ini_options]
pythonpath = ["."]

Without that, your tests may struggle to import the code from the source directory.

Personally, I usually use the second approach and keep my tests separated from the rest of the code. It might be more difficult for large projects though, and sometimes it may be easier to reuse some parts of your code if you keep the tests closer to the source code. I guess it's just another decision to make.

Using Poetry with an existing virtual environment

The first approach assumes that we already have a virtual environment that we want to use. It's probably better to use a fresh environment for each project unless the list of dependencies is the same (or almost the same) for multiple projects. Otherwise, you may run into issues with installing different versions of your dependencies. So, in this section, I'll assume we have a fresh environment and we don't need to worry about that.

If you activate a Python environment, Poetry will detect that and use it automatically. It's important to remember that because it's going to do that even if you configure Poetry otherwise (more about that later). So, following the previous sections, we can run pyenv activate sample-venv to activate the one we created before, and from now on, we can start working on our project.

Using Poetry with a selected Python version

Another way to go is to create a venv automatically with Poetry. That environment will only be applied to that one project. That separation has advantages - you don't need to worry about breaking any other project's dependencies, and other projects won't break dependencies for this one.

With this approach, we can decide where that new venv should be created. By default, Poetry will store it in your home directory (on Mac OS, it may be something like /Users/user/Library/Caches/pypoetry/virtualenvs/). But there's a way to change that. If you want to create it in the root directory of your project, run these two commands:

poetry config --local virtualenvs.create true
poetry config --local virtualenvs.in-project true

or you can create a file called poetry.toml in your project's root directory (those commands do the same thing):

[virtualenvs]
create = true
in-project = true

This configuration tells Poetry that you want to create the environment in a project's directory.

That's one thing, but we're not done. We need to make sure that the Python version we downloaded with pyenv before can be reached from our terminal session without activating a virtual environment, and then we need to tell Poetry to use that version. To achieve that, execute these commands:

# First, make sure that we can use the downloaded Python version
pyenv shell 3.11.7

# Now, tell Poetry to use it
poetry env use 3.11.7
Creating virtualenv sample-project in /Users/user/dev/sample-project/.venv
Using virtualenv: /Users/user/dev/sample-project/.venv

Once you do that, you'll notice that a new directory has been created in your project - .venv. This is where your virtual environment is going to be. It's important to add that to .gitignore or another file that tells your version control system that it should not be stored in your repository.

Basic Poetry commands

I'll try to describe a few important Poetry commands here. You can see the full list of commands by running poetry list, and each command has additional documentation that you can see by adding the --help flag to the command. For example, to learn about the poetry install command, you can execute poetry install --help.

Actually, poetry install is the first command in our list. It's used to install the dependencies in the virtual environment.

The next one is poetry add, and you can use it to add new dependencies to your project. For example, to install the latest release of numpy just run poetry add numpy. You can specify the version of the library by using the @ character, for example, poetry add numpy@^1.26.1. And to install something as a dev dependency, you can add the --group parameter to the command, eg. poetry add --group=dev pytest.

If you decide that you don't need some library anymore, you can remove it from the project with the command poetry remove. It also takes the optional --group parameter, so to remove that pytest we installed above, run poetry remove --group=dev pytest.

The last one I'll talk about here is the poetry run. It's used to run the executable programs installed with the dependencies, and that includes Python. So to run a Python script, you can run poetry run python src/main.py or something like that, and to execute the tests with pytest, use the command poetry run pytest. The same comes for linting the project with black (poetry run black src), and so on.

There is one more file worth discussing - poetry.lock. It's used to store very detailed information about every single library we use in our project. Thanks to that, we can be sure that exactly the same versions of all libraries are always installed. Poetry updates it automatically when you install and remove packages and you should not update it manually (unless you really know what you're doing). It can be regenerated using the poetry lock command.

Configure your project with pyproject.toml

In most projects, you'll probably use tools like code linters and formatters, or maybe pytest to run the tests. Some of those tools can be configured. For example, code linters may check only some specific rules, and we actually saw an example of the pytest configuration above (the one with the Python path).

A very cool thing about pyproject.toml is that it can be used to configure most (if not all) of those tools. Otherwise, you'd probably need a separate configuration file for each of them.

Below, there is a sample file with some configuration options added. That file can be used as a starting point for most Python projects, except for the versions of used libraries (it's important to keep things up-to-date). In fact, that's more or less what I use for most of my projects (along with more specific libraries).

This is the content:

[tool.poetry]
name = "sample_project"
version = "0.1.0"
description = ""
authors = []
readme = "README.md"
packages = [
    {include = "sample_project", from = "src"},
]

[tool.poetry.dependencies]
python     = "^3.11"

[tool.poetry.group.dev.dependencies]
black      = "^23.10.1"  # MIT
pytest     = "^7.4.3"    # MIT
pytest-cov = "^4.1.0"    # MIT
ruff       = "^0.1.3"    # MIT

[tool.ruff]
select = ["B", "D", "E", "F", "I", "N", "Q"]
ignore = [
    "D100",  # Missing docstring in public module
    "D104",  # Missing docstring in public package
    "D105",  # Missing docstring in magic method
    "D106",  # Missing docstring in public nested class
    "D107",  # Missing docstring in `__init__`
    "D200",  # One-line docstring should fit on one line
    "D203",  # 1 blank line required before class docstring
    "D205",  # 1 blank line required between summary line and description
    "D212",  # Multi-line docstring summary should start at the first line
    "D213",  # Multi-line docstring summary should start at the second line
    "D415",  # First line should end with a period, question mark, or exclamation point
    "D400",  # First line should end with a period
    "F403",  # from {name} import * used; unable to detect undefined names
    "F405",  # {name} may be undefined, or defined from star imports:
]
ignore-init-module-imports = true
show-fixes = true

[tool.ruff.flake8-unused-arguments]
ignore-variadic-names = true

[tool.ruff.lint.pydocstyle]
convention = "google"

[tool.pytest.ini_options]
addopts = [
    "-vv",
    "--cov=sample_project",
    "--cov-report=term",
    "--cov-report=html",
    "--cov-fail-under=100",
]
pythonpath = ["."]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",         # Have to re-enable the standard pragma
    "raise NotImplementedError" # Ignore not implemented methods
]

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Some sections are managed and used by Poetry itself so I'm not gonna talk about them, but some are a little bit more interesting.

One example may be the configuration of ruff (which is code linter and formatter). In lines 21-37, we can see some linting rules it's gonna check, and in line 45, we set the docstring convention we use in the project.

Lines 48-54 define parameters that will be passed to pytest every time we run the tests and line 55 fixes that Python path issue we talked about earlier.

We also have some additional rules for the coverage library - we tell it to ignore some statements in the code and not include them in the test coverage calculations (see lines 58-61).

Thanks to the fact that this is a very common configuration format, most tools you're going to use probably can be configured in that file, and there should be documentation available for that as well (for some examples of that, see the documentation of pytest, ruff, or coverage).

Also, it's possible to add comments to the file. I encourage you to use that possibility to keep everything as clear as possible.

Conclusion

Okay, this turned out to be a bit longer than I initially anticipated. Let's summarize what we learned today.

To work on Python projects, it's good to have a way to manage Python versions, and to stay away from Python preinstalled in your OS. For that, I use pyenv but there are alternatives (eg. Anaconda).

Also, you'll probably need a tool to manage dependencies in your projects. Pip may be enough for small ones but I'd recommend using Poetry anyway since it gives you many cool features on top of dependency management.

As mentioned above, I decided to describe using Docker in a separate post as there may be quite a lot to talk about. I'll also try to dive deeper into other aspects of Python projects, such as tests, using environment variables to configure your applications, using CI/CD tools to automate as much as possible, etc.

I tried to explain everything with as many details as necessary to provide enough context but I didn't want to make it too long. I realize that there are other ways to achieve everything I described here. My goal was just to give you an idea of how it can be done, not to give you the one and only recipe for all of it. If you have some suggestions or know better solutions to some of the problems I talked about in this post, and/or you noticed something that doesn't make sense then, please do let me know in the comments. I'd love to learn more and I will do my best to update and keep it bug-free.

Thanks for reading and see you in the next one!

Appendix A: Useful resources

Appendix B: Post update history

Date Updates
December 15, 2023 Initial version published