Hypermodern Python

jalons · on May 28, 2020

This helps highlights the main issue I have with python today, and that's running python apps.

Having to ship a compiler to a host|container to build python with pyenv, needing all kinds of development headers like libffi for poetry. The hoopla around poetry init (or pipenv's equivalent) to get a deterministic result of the environment and all packages. Or you use requirements files, and don't get deterministic results.

Or you use effectively an entire operating system on top of your OS to use a conda derivative.

And we still haven't executed a lick of python.

Then there's the rigmarole around getting one of these environments to play nice with cron, having to manipulate your PATH so you can manipulate your PATH further to make the call.

It's really gotten me started questioning assumptions on what language "quick wins" should be written in.

alexhutcheson · on May 28, 2020

You can use Bazel to build a self-contained Python binary that bundles the interpreter and all its dependencies by using a py_runtime rule[1]. It's fairly straightforward and doesn't require much Bazel knowledge - there are simple examples on GitHub[2].

There are a couple other tools that take the same approach, including PyOxidizer[3], which was written by a Mercurial maintainer.

[1] https://docs.bazel.build/versions/master/be/python.html#py_r...

[2] https://github.com/erain/bazel-python-example

[3] https://pyoxidizer.readthedocs.io/en/stable/overview.html

ed25519FUUU · on May 28, 2020

Can you really make it self contained? For example, does the host need a tz package? What about libssl or libcrypto?

As far as I know the only language making static binaries easily is Go, but it was a first class language design principle.

For everybody else it’s a jenky 1000 line Makefile. And don’t get me started on cross-compiling!

cbarrick · on May 28, 2020

> As far as I know the only language making static binaries easily is Go, but it was a first class language design principle.

Rust does this as well.

The official high level build tool, `cargo`, uses a declarative TOML file for dependency management and supports lock files for deterministic builds. The default output is a single, statically linked binary.

Rust does depend on libc (like Go) which brings in dynamic linking on some platforms. But Cargo supports easy cross-compilation, and the `x86_64-unknown-linux-musl` target will produce a fully static binary.

tastyminerals · on May 28, 2020

D, Rust -- just what first springs to mind.

thethimble · on May 28, 2020

> Binaries produced with PyOxidizer are highly portable and can work on nearly every system without any special requirements like containers, FUSE filesystems, or even temporary directory access. On Linux, PyOxidizer can produce executables that are fully statically linked and don’t even support dynamic loading.

https://pyoxidizer.readthedocs.io/en/stable/overview.html

slaman · on May 28, 2020

I've found PyOxidizer immature in comparison to pyInstaller. https://www.pyinstaller.org/

Your milage might vary, but I think the former is still very much a work in progress.

cle · on May 28, 2020

It can statically link Linux executables, including musl libc, libssl, libcrypto, and other libs that are usually dynamically linked.

It can't do this for non-Linux executables (e.g. Windows, Mac). On those OSs, the executables are dynamically linked with system libraries.

toby · on May 28, 2020

Rust?

ed25519FUUU · on May 28, 2020

Rust can definitely do it, but there still are a lot of gotchas. Many languages can do it, but there are so many pitfalls. For example, a host tz package.

jzoch · on May 28, 2020

I would argue rust does it much better than go. When you have to resort to hacks like cgo that subtly change the performance and functional characteristics of your program i wouldnt call it "first class". Its good, dont get me wrong, I like how go cross-compiles most things. I wouldn't say its the gold-standard as long as cgo continues to be a thing

Edit: I mention cgo as many who want to cross-compile a statically linked binary may want to interface with other libs via FFI and this is a huge gotcha. It is a bit tangential to strict "static linking binary building".

dnautics · on May 29, 2020

So now as a developer you want me to install the jvm first, before even installing python?

alexhutcheson · on May 29, 2020

Bazel bundles a private Java runtime, so you don’t need to install the JDK unless you plan to compile Java code: https://docs.bazel.build/versions/master/install.html

doorstar · on May 28, 2020

I decided to drag myself kicking-and-screaming to the 21st century and start writing my handy-dandy utility scripts in python instead of bash. All was well and good until I made them available to the rest of my team, and suddenly I'm in python dependency hell. I search the internet and there are a lot of different solutions but all have their problems and there's no standard answer.

I decided "to heck with it" and went back to bash. There's no built-in JSON parser but I can use 'grep' and 'cut' as well as anyone so the end result is the same. I push it to our repo, I tell coworkers to run it, and I wash my hands of the thing.

zxexz · on May 28, 2020

jq has been a lifesaver for me parsing json in bash. Of course, it's an external utility not present by default in most systems.

Another thing to consider is more of a middle-ground approach. Most systems do have a python interpreter, so you can use a lot of base python without worrying about dependency hell. I use inline python in bash all the time, e.g.

  ls | python -c 'import sys,json;lines=sys.stdin.read();print(json.dumps(list(filter(bool,lines.split("\n"))),sort_keys=True,indent=2))'

You can even use variable substitution, if you surround the python code in double quotes. Even mix f-strings and bash substitution

  python -c "print(f'Congrats, ${USER}, you are visitor number ${RANDOM}. This is {__name__}, running in $(pwd)')"

kbenson · on May 28, 2020

Or use a heredoc to not worry about competing quote chars:

  # python << EOPYTHON
  print("Congrats, ${USER}")
  print("You are visitor ${RANDOM}")
  print("This is {__name__}, running in ${pwd}")
  print("It's a heredoc to allow both quote characters")
  EOPYTHON

moreaccountspls · on May 28, 2020

Great trick with using the python standard lib! Thanks for posting that.

edit: You probably already know this, but for anyone reading along, piping `ls` is unsafe if you plan to use the paths for anything except for printing them out. A path on linux can contain any byte except for NULL, so when `ls` prints them out, you can get broken behavior if you try to break on newlines.

xioxox · on May 28, 2020

Just a question - why do you have a dependency hell? You could restrict yourself to the Python standard library, and you would only have one dependency. The Python standard library is much nicer than bash if you need more complex data structures than what bash provides.

cle · on May 28, 2020

"grep" and "cut" are not Bash, they are programs and have dramatically different feature sets between distributions and OSes (grep on MacOS is very different from grep on a modern Linux distribution using GNU Coreutils, and there are many incompatibilities). Many scripts that work on Linux won't work on Mac because of this.

With Bash, your best bet for portability is to run scripts in a Docker container. If you want portable code, you have to bundle your dependencies--there's no free lunch here, including Bash.

vlovich123 · on May 28, 2020

When I was at Google I had a similar problem (team wasn't using Blaze). So what I did was to have a wrapper entrypoint around every python entrypoint that would just run that python entrypoint (e.g. foo would execute foo.py). The advantage was that the shell script would first set up a virtual environment for every entrypoint and install all the packages in the requirements.txt that was beside the entrypoint (removing any new ones). Each requirements.txt was compiled from a requirements.in file via pip-sync [1] which meant that devs only had to worry about declaring just the packages they actually directly depended on. Any change to requirements.in would require you to have run pip-sync which wouldn't (by default) upgrade any packages & only lock whatever the current version is (automation unit tests would validate that every requirements.txt matched the requirements.in file).

This didn't solve the multiple versions of python on the host. That was managed by having a bootstrap script written in python2 that would set up the development environment to a consistent state (i.e. install homebrew, install required packages) that anyone wanting to run the tools would run (no "getting started guides") which also versioned itself & was idempotent (generally robust against running multiple times). We also shipped this to our external partners in the factory. Generally worked well as once you ran the necessary scripts once no further internet access was required.

It wasn't easy but eventually it worked super reliably.

[1] https://github.com/jazzband/pip-tools

doorstar · on May 28, 2020

I actually did something very similar when my application had to execute a python script on any old box and I was strictly forbidden to make any changes on the host machine. My application refused to start if python 3 wasn't found so I didn't have to deal with that mess. It ran bash, setup the venv, did python-y stuff, clean up the venv, take only pictures leave only footprints.

vlovich123 · on May 30, 2020

The caveat is that with mine the venv wasn't destroyed at the end of execution. Instead I put a snapshot of the sha256sum of the requirements.txt file which I double-checked on boot. If that changed then I ran pip-sync.

This was critical for devs because this was the underlying thing for all scripts devs ran (build system, terminal to device, unit tests, etc etc). Startup latency was key & I spent time optimizing that to feel as instant as a native executable unless the virtual environment changed which isolated the expensive part (& generally happened more & more rarely for any given tool as I found the dependency set to mature & freeze pretty quickly).

This had a great side benefit making it super-easy to run the scripts once on an internet-connected device & then use that as the base image for all the factory machines that could then be offline because all the virtual envs had been initialized.

pas · on May 28, 2020

This might seem like lunacy, but I really like/recommend Ammonite instead of Python/Bash.

It's Scala, runs on the JVM, and is perfect for writing scripts. (It has a great built in dependency resolver, I mean it uses Ivy, but it downloads the dep by itself, you just import it via the "maven coordinate" - http://ammonite.io/#IvyDependencies )

It gives you a lot more safety/correctness than Python, and it's a bit simpler to install too. (No need to compile extensions, just get JDK8 and it'll run.)

kbenson · on May 28, 2020

The solution to this (at least the one we've landed on at work) is to make sure your dependencies are packages in a yum repo you include on your systems. For us, that's a local private yum repo our systems have access to which we package perl Module requirements into that aren't in the public repos. We also include our private libraries there. If the utility script is commonly enough used, we'll make an RPM for is as well, or stick it in one of our general purpose utils RPMs and make sure dependencies are set. If that's done, you don't have to worry about dependencies at all, if not, you might have to manually yum install a few things that are grabbed from our yum repo.

There are lots of ways to handle this problem, but if you're handling lots of systems, you presumably already have a method you use to keep them up to date and secure. You presumably are also installing Python from the system packages (if not, you probably shouldn't be writing system utils in it unless you can ensure it's the same on every system you guys maintain, in which case your dependency problem shouldn't be a problem), so tie into that mechanism. It's a lot easier to reason about when there aren't two competing systems, and presumably you aren't going to do away with the security updates the distro provides.

spentu · on May 30, 2020

While I can understand your pain related to dependencies with Python. I still cannot wholeheartedly support such of way. Depending on case bash scripts are valuable and should be utilized instead of using Python. However in some cases this can be painful for other developers, if used in wrong use cases.

I recently received a script from partner company that used such of script for forwarding data to their API. It was quite long and had few dependencies that were not visible until you (stupidly) executed it.

Few random thoughts:

- Bash scripts can be ran in environments where all dependencies to binaries are not met. In these cases the script might cause damage if they expect that everything is available.

- When someone is unexpectedly required to modify the script it can be difficult or cause issues when this is done by inexperienced developer (in this age I wouldn't be surprised)

- If the script uses a program that is required to be certain version for getting wanted results it may cause issues

- The environment where script is ran is usually not a vacuum. Another scripts might change environment variables or change/remove programs in general

While dependencies with Python can cause issues in the future. The trade-off is having some sort of control as long you don't execute other binaries directly.

jalons · on May 28, 2020

Exception handling and (unit|py)test are worth the headaches.

moreaccountspls · on May 28, 2020

This is why I've switched to writing "quick wins" in shell [or Go]. It's just so much nonsense that has nothing to do with actually programming. Posix shell can be a bit baroque, but you know that it's not ever going to change and because of that, it's pretty easy to ship to any *nix.

There is the question of the dependencies of a shell script, but I find in practice just checking for deps like `curl` at the beginning leads to be a better user experience. It's unlikely that there is going to be a ton of tools you require, and the tools you do require are probably going to be good about backwards compatibility [curl again as an example].

Dobbs · on May 28, 2020

Except it does all the time. There are innumerable differences between the OSX, BSD, GNU, and other versions of common command line tools. There are plenty of cases where `jq` will or will not be available. Finally there are differences in how `/bin/sh` will interpret things (which there shouldn't be) depending upon underlying shell is running ksh, zsh, bash, dash, etc.

moreaccountspls · on May 28, 2020

> There are plenty of cases where `jq` will or will not be available.

Sure. The argument is that it's a lot easier for the user of the program to read an error message that says "jq is required. Run apt-get install jq or homebrew install jq" than to fuck around with the python or ruby ecosystem, especially if they don't work in those languages.

> Finally there are differences in how `/bin/sh` will interpret things (which there shouldn't be) depending upon underlying shell is running ksh, zsh, bash, dash

Do you have an example of code that is written to the POSIX standard of shell that runs differently? I only write POSIX shell, and use https://github.com/koalaman/shellcheck to verify that to prevent that exact thing.

thiht · on May 28, 2020

And if you cannot install jq because you're not root, you can still wget it somewhere, it's a static binary.

cle · on May 28, 2020

It's not a static binary, at least on my Mac. It dynamically links against Oniguruma.

dijit · on May 28, 2020

I generally agree with your sentiment here, but be careful with assuming bash==bash

There are differences between versions. I can't even remember what they are off the top of my head like I used to, which makes them all the more aggravating to discover again.

But I would recommend sticking to a subset of bash, not any of the new fancy features like 'globstar' which allows recursively globbing.

There are tools to manage these kinds of tests, like bashenv. But you're in the same problem scope at that point.

GolDDranks · on May 28, 2020

I much agree with the sister comment and I write my shells for /bin/sh also. There is this wonderful tool called ShellCheck ( https://www.shellcheck.net/ ) that checks that your script is actually POSIX-compliant if it starts with #!/bin/sh

moreaccountspls · on May 28, 2020

That's why I said "Posix shell" and not bash.

alexhutcheson · on May 28, 2020

POSIX shell is miserable for programming anything beyond a couple lines. It doesn't even have arrays[1], so you have no available container types within the interpreter itself.

[1] Well, it has $@, which you can use as a general-purpose array with some hacks[2], but that's no way to live.

[2] http://www.etalabs.net/sh_tricks.html

Spivak · on May 28, 2020

I don't think it's too unreasonable to assume that you'll be able to find bash anywhere you'd find a general purpose python installation and it has plenty of niceties.

But even the nicest shell doesn't solve the dependency problem like statically compiled programs. If I could take my currently running Python code and produce some artifact that would run with nothing other than the python binary I think we'd be in a much better place.

Ohh apparently all I've needed in my life is zipapps.

alexhutcheson · on May 28, 2020

Agreed that Bash is (relatively) fine, although error prone. My comment was about POSIX shell, which has none of the features (arrays, [[ instead of [, etc.) that make programming tolerable in Bash.

One drawback is that if you want your Bash script to work on macOS, you need to restrict yourself to features that exist on version 3.2 (from 2006) because that's the latest version that will ever be included on macOS by default.

> If I could take my currently running Python code and produce some artifact that would run with nothing other than the python binary I think we'd be in a much better place.

See my other comment: https://news.ycombinator.com/item?id=23338316

moreaccountspls · on May 28, 2020

That's a matter of opinion. I don't find using "$@" to be a big deal in practice.

Let me put it like this: I'm a programmer. I don't mind making programming a bit harder for myself if it means that I get to avoid a lot of the non-programing minutia that's part of a modern interpreted environment.

Also, if you're willing to take a dependency on jq, the issue goes away completely.

BiteCode_dev · on May 28, 2020

Most of the time, you don't need all that, since Python has zipapps. You defined deps, you zip it, you ship to any same os with the same python version. It embeds everything, and just run.

We even how have a nice tool to automate the bundling for you:

https://pypi.org/project/shiv/

Of course you still have to figure out how to get a Python installed on the final machine, that's the price to pay to be an interpreted language.

We don't have yet a story to ship a beautiful exe/dmg/deb/rpm that embeds the zipapp and libpython in an easy way.

michaelcampbell · on May 28, 2020

> you ship to any same os with the same python version

This is a non-trivial thing to handwave away.

BiteCode_dev · on May 28, 2020

I agree, but it's still way easier than the original story, which is the one you also have with PHP, Ruby, JS, etc.

Using an interpretted language always leads to this.

I know no popular interpretted language with a seamless experience to ship a standalone exe.

In fact, Python is probably the one with the best story here, since it has nuitka (https://nuitka.net/), which allows to compile Python code into a fully standalone exe.

But then you need to install a compiler, headers, etc. And no cross compilation of course. Not to mention on Linux, you have to ensure you target the lowest version of libc you can.

You are still very far from Go or Rust, and I'm hoping one day that RustPython will succeed because that would mean an amazing deployment story.

Meanwhile, you trade the ease of deployment of compiled languages for the ease of development of interpretted ones.

I think it's a fare trade for most people: you dev the program much more often that you deploy it.

That doesn't mean we shouldn't work, as a community, to improve the deployment story. It's a serious hindrance.

That's the raison d'être of the Briefcase project (https://beeware.org/project/projects/tools/briefcase/). It's still in progress, but the last prez I saw on it was quite impressive already.

pansa2 · on May 28, 2020

> I'm hoping one day that RustPython will succeed because that would mean an amazing deployment story.

Isn't RustPython just an alternative interpreter to CPython, implemented in Rust instead of C?

How would RustPython offer better deployment than CPython?

BiteCode_dev · on May 28, 2020

Rust has a fantastic deployment story: compiling a rust program is super easy, and you can cross compile. Using cargo and rustc is a breath of fresh air compared to any similar experience with C compiling.

So if one day RustPython gets compatible enought with CPython that you can use it as a drop in replacement, you can start creating a tool that compiles any Python VM for any target, and bring along your program with it. Making a standalone version of it would become much easier.

Right now, doing so either requires you to bring in a pre-compile version of cpython for your target (which is what briefcase does) or compile the thing yourself with gcc + headers + deps(which is what nuitka does).

It's not easy.

throwaway894345 · on May 28, 2020

> So if one day RustPython gets compatible enought with CPython that you can use it as a drop in replacement

I don't think this will ever happen unless the community converges on a standard C-extension interface. Presently Python leans so hard on C-extensions, but there is no standard interface--if you're writing a C-extension library, you just depend on whatever obscure corner of CPython that suits your purpose. If you're writing an alternative Python interpreter, you have to implement the entire surface area of CPython, which generally means you must implement CPython exactly and you are severely restricted on the improvements you can make. At that point, why even bother?

Fortunately, I think there are emerging candidate interfaces, but the community needs to either update C-extension packages to use those interfaces or support packages (and maintainers) who already do. https://github.com/pyhandle/hpy.

BiteCode_dev · on May 28, 2020

There are probably only a dozen of c popular extensions that needs to support HPY reach the tipping point of mass adoption: numpy, scipy, pycuda, tensorflow, matplotlib, uvloop, etc. and some db drivers.

The rest is not popular enought to be a blocker. You will hear them scream a lot, but they will be like 0.00001% of the user base, and we can just tell them to stay on CPython with its limitations. They don't lose anything, just not gain anything either.

Those C extensions authors are directly in communication with Python core devs, when they are not core devs themselves, so if HPY is adopted, we can expect a total adoption under 5 years.

Numpy authors already said it would take 1 year to adopt it.

Give the huge number of benefits of HPY, I deeply hope it will be a success.

throwaway894345 · on May 28, 2020

I'm not sure. I would certainly add psycopg2 to that list, since it's really the only well-supported way to speak to a Postgres database via Python. I imagine other database dialects will have similar issues. And there's probably a whole host of other prominent libraries that we're just not thinking about because we only run into them when we're trying to use something like Pypy, and even then we only run into one or two at a time before giving up and going back to CPython.

panarky · on May 28, 2020

> a breath of fresh hair

Indeed.

BiteCode_dev · on May 28, 2020

:) Fixed

Hamuko · on May 28, 2020

youtube-dl for example is distributed as a zipapp and it seems to be distributed just fine. It only requires you to have Python installed on your system, which isn't too burdensome of a requirement on macOS/Linux. On Windows they do actually distribute a Python interpreter.

_vbdg · on May 28, 2020

Youtube-dl is a great tool, but the audience (primarily technical people) of any CLI app won't be representative of shipping an app for general use.

airstrike · on May 28, 2020

GP's point was not about "general use" apps, but apps for other developers / system maintainers

ses1984 · on May 28, 2020

What's the story if my zipapp depends on tensorflow and/or cuda?

I don't know exactly how it goes but I'm pretty sure it's a horror story.

BiteCode_dev · on May 28, 2020

As usual with extensions, you are not using Python anymore, but a compiled language. To get 100% certainty, you'd need to compile the whole thing.

That being said, a lot of extensions are pre-compiled and provided as wheel, which is the case for tensorflow (I don't know for CUDA, I can't test on a laptop without a GPU).

Let's see what this means:

    $ py -m venv test
    $ test\Scripts\activate
    $ pip install tensorflow
    $ code hello_tensor.py
    # import tensorflow as tf

    # def main():

    #     with tf.compat.v1.Session() as sess:
    #         a = tf.constant(3.0)
    #         b = tf.constant(4.0)
    #         c = a+b
    #         print(sess.run(c))

Now with shiv:

    $ copy hello_tensor.py test\Lib\site-packages
    $ shiv -e hello_tensor.main --site-packages test\Lib\site-packages\ -o hello_tensor.pyz
    $ python hello_tensor.pyz
    ...
    2020-05-28 17:31:46.580704: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    7.0

So it works fine, but remember:

- it will only run on the system this particular wheel has been designed to run on. In my case cp38-win_amd64.

- it will come bundled with tensorflow, which is a behemot, meaning your hello world pyz will around 500 Mo.

- it needs to unzip, so the first run will be REALLY slow

For something like this, I would advice a more generic deployment tool, like fabric 2 if it's remote, or a make-like tool such as doit if it's local only.

Make your deployment script, and zipapp that.

throwaway894345 · on May 28, 2020

Zipapps are an order of magnitude improvement in the Python world, but there are still lots of other major pain points like dependency management and performance which still leaves Python several orders of magnitude behind its competition. Hopefully these things change going forward.

pansa2 · on May 28, 2020

It looks like zipapps built with “shiv” need to extract the contents of the zip file to disk before they can run? Does it delete the extracted files on exit?

If so, the extraction is going to make startup very slow. If not, that’s just messy. Either way, it’s not ideal.

BiteCode_dev · on May 28, 2020

It's doing the work only once.

Yes, it's not ideal.

But it beats shipping your entire dev env to the server.

I find it a good compromise. The extraction is done is $HOME/.shiv/{zipappname}_{zipapphash} so it's not a horrible mess. But if your project is big, you do have to clean up the old install because it can eat a significant amount of space.

For you regular cmd tool though, it's a blip.

jasonpeacock · on May 28, 2020

> you still have to figure out how to get a Python installed on the final machine, that's the price to pay to be an interpreted language

Also for Java. The installs usually include their own favorite version of Java alongside the actual application.

zo1 · on May 28, 2020

Installing Java is a bit of a difficult task these days. Even worse if you want to get the JDK which is behind a register-wall.

pritambarhate · on May 28, 2020

Not if you are installing OpenJDK or Amazon Corretto. Add the PPA and install with apt-get. Like every other Ubuntu/Debian package.

alexhutcheson · on May 28, 2020

jlink[1] solves this for JDK 9 and later.

[1] https://docs.oracle.com/javase/9/tools/jlink.htm

squaresmile · on May 28, 2020

I probably haven't bought into all of poetry yet but for deployment, I have been using "poetry export" to get the pinned requirements.txt, commit it to the repo and install to a virtualenv. A bit of work to keep it in sync with the poetry dependency file but that's ok.

For PATH with cron or others, I use the full path to the virtualenv such as /path/to/project/.venv/bin/python. The path can be extracted by "which" or "Get-Command" when the venv is active.

Using a python version different from the system python version is probably the messiest part but well, targeting 3.6 is alright.

I do agree it could be better and it's not quite as streamlined as other ecosystems.

BiteCode_dev · on May 28, 2020

Honestly, pip freeze includes the whole content of a venv site-packages, and the exact versions. For most projects, that's equivalent to all the dependancies recursively pins with peotry, although you don't have the clean pyproject-dev-prod/lock file separation.

So a huge number of cases can be handled with just that. It will be "reproducible" enought for a lot of people.

squaresmile · on May 28, 2020

> although you don't have the clean pyproject-dev-prod/lock file separation

That's why I use "poetry export -f requirements.txt > requirements.txt" instead of pip freeze. It only exports prod requirements from the poettry lock file.

BiteCode_dev · on May 28, 2020

Nice trick, and very useful to make sure people never have to know about poetry if your team can't deal with it.

hprotagonist · on May 28, 2020

> Then there's the rigmarole around getting one of these environments to play nice with cron...

You pass cron the full path to your entry point. Where’s the rigamarole?

jalons · on May 28, 2020

You've also got to include the pyenv shim instantiation. So now you've got something like

0 0 * * * /path/to/bash/script/to/init/pyenv && /my/path/to/poetry run /my/path/to/python.py -arg1

Chiron1991 · on May 28, 2020

Hm, haven't tried it, but doing this should be much easier:

0 0 * * * /path/to/interpreter/created/by/poetry/bin/python myscript.py

EE84M3i · on May 28, 2020

Does bin/python in a virtualenv set PYTHONHOME correctly..?

remlov · on May 28, 2020

Yes. And the answer by Chiron1991 is the proper way to do this since pretty much forever.

EE84M3i · on May 28, 2020

Strange, this does seem to work with python3.8 on ubuntu 20.04 (the site-packages shows up in sys.path), but for me in a virtualenv bin/python is a symlink to the system python, so how does python 'know' what path to use? Is there logic baked into the interpreter?

I seem to recall that with python2.7 that calling bin/python in a virtualenv without activating the virtualenv did not used to "work" (i.e. it would use the system packages). Did this change at some point or is my memory just wrong?

jalons · on May 28, 2020

Yes, if the python interpreter the poetry environment utilizes is in your $PATH.

hprotagonist · on May 28, 2020

this is precisely correct.

cbcoutinho · on May 28, 2020

If the path to your executable is fixed, just put it in the shebang and you're done - makes everything way more explicit at the cost of some dynamic behavior.

An anecdote: Homebrew uses this method for shipping python executables.

im3w1l · on May 28, 2020

The "production version" of your script should be running in your system environment with system packages. pyenv and friends should be used for testing with different versions and making sure you don't accidentally depend on idiosyncrasies of your box.

The exception is if your python thingy is "the main thing" running on a server, i.e. your customer facing webapp.

My $.10 anyway

jalons · on May 28, 2020

I tend to agree with you that pyenv|pipenv|etc shouldn't be used for actual production usage.

This of course leads to other issues to solve, now that your development environment doesn't actually mirror production.

ses1984 · on May 28, 2020

How do you package your entire pyenv as one or more system packages?

im3w1l · on May 28, 2020

This isn't exactly what you are asking, but https://askubuntu.com/questions/90764/how-do-i-create-a-deb-...

hprotagonist · on May 28, 2020

pyinstaller, shiv, pex, docker: depending on use case, any of these may be appropriate.

nojito · on May 28, 2020

I solved all of my python deployment concerns by using lxd.

I wonder how many others are doing the same but are keeping it close to their chest because it’s such an amazing advantage.

levi_n · on May 28, 2020

I'd love to more about how you are using lxd

ebg13 · on May 28, 2020

I once threw a relatively complex Python application with background server/client processes at Cython and the generated .exe literally just worked without any special effort. I don't know how transferable that is, but N=1 it's not always as hard as what you're thinking.

jtdev · on May 28, 2020

Containerizing Python applications can really simplify things in that regard.

linkdd · on May 28, 2020

No, the problem is just moved inside the container.

jbergknoff · on May 28, 2020

In other words, it becomes the concern of the person shipping the code, rather than the concern of the person trying to run the code. That's exactly how it should be.

throwaway894345 · on May 28, 2020

In other languages it's a problem for neither, which I think is the parent's point.

linkdd · on May 28, 2020

Still, the person trying to run the code has to setup Docker.

And if you're on Windows, your Docker host is in a virtual machine, so networking and volumes are not so simple anymore.

Replacing one kind of complexity by another is not a solution, it is a trade-off.

jtdev · on May 28, 2020

Are people still suffering through hosting Docker containers on Windows? Why would anyone do that at this point other than to comply with outdated, arbitrary IT policies?

linkdd · on May 28, 2020

Just an example of platform specific issue even with Docker.

nilkn · on May 28, 2020

The problem with this is that someone else can't even run your program outside of a Docker container anymore. That doesn't seem ideal.

jtdev · on May 28, 2020

They also can’t run any program outside of a computer and OS; their are some basic prerequisites to running software - Having a Docker/container host has become one of those prerequisites for many applications, but it actually reduces the headache of numerous other traditional prerequisites.

nilkn · on May 28, 2020

I don't want to have to run a simple Python program in a container for quick and simple development or testing. That's a failure of engineering discipline. By all means, do provide a Docker container and do use containers for actual deployments, but also make it easy for me to just use, say, pip-tools or whatever else your organization has standardized on for Python. If we're talking about something with complex C or C++ dependencies that's quite different. If it's just a few pip dependencies and there's no way for me to just run it reliably outside of a container, though, that's a result of not following best practices.

jtdev · on May 28, 2020

Agreed, I typically include a README as well as a requirements.txt so one can easily 'pip install -r requirements.txt' and then 'python app.py' to run simple apps without a bunch of rigamarole.

nilkn · on May 28, 2020

I probably misunderstood you -- apologies. I think we're 100% in agreement.

linkdd · on May 28, 2020

I use constantly Docker in my job and projects yes. Yet, I do not believe and advocates it gets rid of the complexity.

According to the user needs, your dockerized application will run with different base distro. Alpine and musl for small OS footprint ? Or Debia(or debian-slim) for glibc compatibility ?

Those concerns are the same with or without Docker. Docker makes things easy, just not those things because it is not its purpose.

jtdev · on May 28, 2020

I typically specify these things in the Dockerfile - if the end user wants to modify the Dockerfile because they prefer Alpine over Debian... they've now taken responsibility of maintaining their customized Dockerfile and ensuring that everything runs as expected. This doesn't seem like something that would be encountered with any frequency in my experience, and you would technically have the same problem with or without Docker in the mix.

linkdd · on May 28, 2020

In the professional world, your end user is either : - someone without the skills to make a Dockerfile - another team who has not the responsability to integrate your work

The packager of an application is part of the project's team. It's not up to the user to package your application.

londt8 · on May 28, 2020

docker with pip-tools is great combination, you get deterministic builds easily

potta_coffee · on May 28, 2020

That's why I switched to Go for a lot of things. I don't care that much for Go as a language but compiling and shipping code is just darn easy.

overcast · on May 28, 2020

Let me know when Django is rewritten in Go.

potta_coffee · on May 28, 2020

I like Django too. I'm building tools on top of AWS, and small REST apis in Go. I don't use it for everything.

moreaccountspls · on May 28, 2020

That's not a "quick win" use case though.

_bxg1 · on May 28, 2020

It may be unpopular to say here, but I see Node as the best option.

- Runtime comes with a package manager

- Dependencies (not just imports, but tooling) are fully manifested in a project-local file

- Installs dependencies in a project-local directory

- Can specify exact package versions if you want maximum stability

- Left-pad can't happen again due to policy changes: https://docs.npmjs.com/cli/unpublish#description

- Doesn't require any build steps or extra hoops if you're fine with skipping static types

In general it just does a really great job isolating from the environment. No messing with environment variables, most things even run fine on Windows out of the box. All you need is node itself installed and you're off to the races, whether you're starting a new project or running one you checked out from github.

traverseda · on May 28, 2020

On linux I normally just use pipx. As long as the package uses proper entry-points it just works.

brown9-2 · on May 28, 2020

> Or you use requirements files, and don't get deterministic results.

What about pip freeze?

easterncalculus · on May 28, 2020

I was also confused, pip freeze defines version numbers so it should be pretty clear-cut.

erlend_sh · on May 28, 2020

RustPython might paint a prettier picture for a better future in this regard.

https://github.com/RustPython/RustPython

radus · on May 28, 2020

For a trivial-ish command-line tool, I've enjoyed using pyinstaller with --onefile to put out a single file executable. Using GitHub Actions, it was also relatively easy to create cross-plaform releases.

rufugee · on May 28, 2020

What about PyInstaller? https://realpython.com/pyinstaller-python/

antb123 · on May 29, 2020

The minimal interpreter is pretty small when compressed - we shipped it to thousands of windows pcs as one exe file.

DreamScatter · on May 30, 2020

Recommend you try out Julia language!

foobarbecue · on May 28, 2020

Why a whole OS? Can't you just install Conda or miniconda?

jalons · on May 28, 2020

It was a tongue in cheek joke that conda is complex enough to be it's own OS.

Mikhail_K · on May 28, 2020

The best modern Python practice is, don't use Python.

cs702 · on May 28, 2020

It's evident from reading the OP and previous similar posts on HN that many developers find it difficult to specify and replicate deterministic Python environments for their applications. Personally, I have found it best to use (a) a virtualenv or conda environment, with (b) a requirements file that specifies fixed version numbers for packages (e.g., `pandas==1.0.3`). Only very rarely have I run into issues doing this; it works quite well for me.

--

That said, from a security standpoint, I'm not sure it's a good idea to run a script downloaded from the web, without verification, on your local command line:

  curl https://pyenv.run | bash

If that URL ever gets hijacked, you would be running malicious code. At a minimum, you may want to take a look at the script before running it, or otherwise verify that you're downloading what you actually want to run.

tyrion · on May 28, 2020

I completely agree. I have no idea why many people seem to be fan of this `curl x | bash` approach.

For pyenv all is needed is to clone their git repository and add a couple of lines to your .bashrc.

dahfizz · on May 28, 2020

I think `curl | bash` is treated unfairly. Whether you `git clone` or `curl` a script, you are fundamentally doing the same thing: downloading and executing code from the internet. `git clone` just feels safer because it is hiding that fact under layers of abstraction.

If I want to run pip, I need to trust PYPA. It's their code I want to run, and I need to download it one way or another. If I don't trust them to keep their domain secure, I don't see why I would trust them to keep their github repo secure.

And the whole point of pip is to download code from PYPI and run it. pip, git, curl|bash, all do the same exact thing in this case. curl|bash just smells funny because it makes it more plainly obvious what is going on.

dtech · on May 28, 2020

git clone will not execute arbitrary code from the internet without inspection.

hagy · on May 28, 2020

I assume the cloned git repo includes arbitrary code that will be executed with the same security profile as a downloaded shell script.

dahfizz · on May 28, 2020

If you read the source of all the programs you run, more power to you. Most people don't.

BiteCode_dev · on May 28, 2020

Honestly, the best approach should be "sudo apt get pyenv", but it's such a pain to package a program as a deb I undertand many people don't bother.

Beside, curl works on all linux AND on mac.

Saves time.

jzzskijj · on May 28, 2020

Exactly. The WHOIS information of pyenv.run doesn't reveal anything who is controlling the domain.

And the "more information" link https://raw.githubusercontent.com/pyenv/pyenv-installer in that script points to "400: Invalid request" result.

hojijoji · on May 28, 2020

Don't you get a bunch of incompatible packages when you restrict to specific fixed version numbers without indication? I guess this is only helpful if you don't plan to reuse your code in another project.

cs702 · on May 28, 2020

Good question. The approach I proposed works for production applications that require an easy-to-replicate, deterministic environment, but I wouldn't recommend it if you're trying to build, say, Python packages or frameworks meant to be used in diverse environments.

thefunnyman · on May 29, 2020

Perhaps this is just my own inexperience showing but I haven’t ever had an issue using venv (which is included in python3 now) and a requirements.txt file.

totalperspectiv · on May 28, 2020

I think that at this point all the ceremony around setting up and deploying a python project outweighs it's 'easy to read and use' aspects. Unless there is a library you can't live without or rewrite, it seems like a language with better tooling and real benefits of a type system is a better choice.

If you're mainstream: Go or Java. If you're edgy: Nim, Scala, or Crystal

All of those have much more sane type, build, and packaging systems.

@perl-people, was this a solved problem when Perl was big? Or is python walking the same roads?

tinco · on May 28, 2020

It most definitely was not a solved problem when Perl was big, as far as I know Ruby is the language that finally solved it with bundler, which was released after Rails was, so that's like 2007 or something.

The recipe for the 'solved' deployment:

  - a compiler that ships with a build tool (so building is homogenous in the community)
  - a centralized or at least uniformly accessible package repository (so dependency acquisition is homogenous in the community)
  - a common file format for describing dependencies (so dependency resolution is homogenous in the community)
  - a common file format for locking dependency versions (so deployment can be done reliably without vendoring dependencies)
  - optional but very nice: a tool for managing compiler versions so it's easy to switch/upgrade projects

Any programming language that has all of these boxes ticked is a modern programming language in my book. As far as I know Ruby is the first that ticked all of them, but other ones I've used that have this: Node.js, Go, Rust, Haskell, Python (though it's a bit messy). I'm pretty sure C# checks them as well nowadays, but I haven't used it in over 5 years so I'm not sure. Same for Java.

sansnomme · on May 28, 2020

Somebody needs to tell the Racket people about this. Dependency management in Racket is still C-style no-version-pinning-anything-goes. The third party dependency managers (see below) are all primitive and do not support multiple versions of the same libraries which is a standard feature in Go/Node.js/Rust/Java class loaders

https://github.com/Bogdanp/racksnaps

autorun · on May 29, 2020

all should be handled by pip. all efforts should be centralized into pip, maybe using a plugin system, whatever.

I think a language should have its own official tooling that can achieve those without installing any external shit

_pd19 · on May 28, 2020

> Unless there is a library you can't live without or rewrite

There are quite a few of these. I couldn't do any of my ML work in a language other than Python or C++.

Calling what Go has a "type system" is a stretch IMO.

elcritch · on May 28, 2020

I’ve been toying with Nim lately to wrap C++ ML libraries, which gives a nice Pythonic syntax but compiles to a binary wrapping the ML lib. Seems to work well for Torch. There's a nice wrapper library nimtorch for Nim [1]. It's a bit out of date bit would be easy to update, probably. Well easier than bundling pytorch on an embedded device. Even manually wrapping the needed C++ libraries isn't that hard in Nim, IMHO.

Overall looking at Python deployment story after using Elixir for the past couple of years makes me cringe a bit. Rather I have no idea how to do it. Deterministic versions, lock files, and container/tarball (or binary) support seems a given in 2020.

1: https://medium.com/@giovanni_94706/introducing-nimtorch-b8b0...

_pd19 · on May 29, 2020

> Well easier than bundling pytorch on an embedded device.

I don't follow - wouldn't nimtorch also require bundling pytorch on an embedded device?

elcritch · on May 29, 2020

Yes and no, Pytorch (like Tensorflow for Python) is just a wrapper around a core C++ library. So you can compile a binary from C++ linking to the pytorch libs without including Python. Technically it's pytorch, but without Python and Python dependency management which is way simpler. Nimtorch let's you wrap the C++ api of pytorch with nice Pythonic looking code, a GC, but using only C++ and possibly statically linked. Win win.

See: https://pytorch.org/tutorials/advanced/cpp_frontend.html

_pd19 · on May 29, 2020

Yes I understand pytorch quite well - I was merely confused because I consider the C++ library to still be pytorch.

I don't see the advantage of this setup over just loading a torchscript model in C++ or any other static language. A full set of bindings seems unnecessary unless you need to train in nim.

elcritch · on May 29, 2020

True, my terminology was a bit outdated as I keep thinking of pytorch as just the Python wrapper on libtorch. That's not really true.

I prefer to avoid Python nowadays due to the pain of dependency management, a nice as this article is don't care to learn about Poetry. So training in pure C++ (or better Nim) is my preferred setup. Keeping a similar build setup for both training and deployment saves a lot of headaches which is why the nimtorch interface is handy even if it's not as full featured/up to date. Now I'm only deploying a very simple NN without much need for experimentation.

BiteCode_dev · on May 28, 2020

As said in upper comment, use zipapps. Don't ship your entire env in prod.

Deployment then just become:

- build zipapp - upload zipapp - ensure python interpretter - run zipapp

Which is just one more step than go.

totalperspectiv · on May 28, 2020

Zipapp doesn't support c extensions though? And that is a large chunk of libraries that I use.

BiteCode_dev · on May 28, 2020

It does, if your extensions are provided as wheels. In which case the resulting pyz file will be runnable on any machine using the OS those wheels are compiled for.

Let's try an example with numpy:

    $ py -m venv foo
    $ foo\Scripts\activate
    $ pip install numpy
    $ code hello_numpy.py
    # import numpy as np
    # def main():
    #     print(np.arange(15).reshape(3, 5))
    $ python hello_numpy.py
    [[ 0  1  2  3  4]
    [ 5  6  7  8  9]
    [10 11 12 13 14]]

Now with shiv:

    $ copy hello_numpy.py foo\Lib\site-packages
    $ shiv -e hello_numpy.main --site-packages test\Lib\site-packages\ -o hello_numpy.pyz
    $ python hello_numpy.pyz
    [[ 0  1  2  3  4]
    [ 5  6  7  8  9]
    [10 11 12 13 14]]

So it works fine, but remember:

- it will only run on the system this particular numpy wheel has been designed to run on. In my case cp38-win_amd64.

- it will comes bundle with numpy. Numpy is quite big, which means your hello world pyz will be 14.1 Mo.

- it needs to unzip, so the first run will be slow

potatochup · on May 28, 2020

Yeah, Shivs work ok. I've run into some stupid issues with the self-extracting directory being have too long path names on Windows and such, so it's not perfect.

Dowwie · on May 28, 2020

if you're long-term focused and investing in a career as a programmer: Rust

tgb · on May 28, 2020

I have to say, at first blush I would not choose click over argparse. I do have to look up the docs of argparse every time I use it, but I like that it's just gives me the args and lets me structure the program flow how I want to, which I think is more natural.

And then I can do things like import a big package (pandas) only after parsing the args, which is highly convenient to users that want to check the argument options without a five second lag.

jobeirne · on May 28, 2020

I recently wrote this (https://github.com/jamesob/clii) because I can't stand click and got sick of having to check the argparse docs every time I wanted to write a CLI. I guarantee you'll spend a tenth of the time trying to figure out how to use this thing, it has no dependencies, and is implemented in a single vendor-friendly file.

fastball · on May 28, 2020

There is also Typer[1]

[1] https://github.com/tiangolo/typer

jobeirne · on May 28, 2020

Similar interface, different design goals. This lib has 6x the code, dependencies, and isn't as easy to vendor/audit.

fastball · on May 28, 2020

Yep, good work! Though I do think the name "clii" could generate some confusion. Maybe not though.

jobeirne · on May 28, 2020

Thanks!

sk0g · on May 28, 2020

Does it handle help automatically? If not, and you think that'd be a valuable addition, I would like to take a crack at implementing it :)

What I mean is if you execute script.py help, it will print out the possible options, along with their docstrings, or maybe just the first line.

jobeirne · on May 28, 2020

It does parse docstrings in a very basic way (https://github.com/jamesob/clii#help-text-from-docstrings) but if you end up hacking in something you want, PRs are welcome.

tboerstad · on May 28, 2020

You really nailed this, imho. Awesome work.

What is the reason for requiring Python >=3.7?

jobeirne · on May 28, 2020

Thanks. That's a really good point - maybe it's usable on earlier versions. Can't remember why I thought otherwise, will have to test.

darkteflon · on May 28, 2020

This looks really good actually - bullshit-minimising. Will check it out.

fabioz · on May 28, 2020

I personally enjoy using https://github.com/google/python-fire nowadays.

-- it's probably not the most flexible tool, but IMHO it's the one with the best API: fire.Fire(obj) and everything else is taken care of ;)

p.s.: for lazy imports, just don't put them in the top of your module in the entry point.

minimaxir · on May 28, 2020

Fire plays nice with package entry_points as well.

Here's a very minimal CLI for one of my PyPI packages: https://github.com/minimaxir/aitextgen/blob/master/aitextgen...

i3winner · on May 28, 2020

I concur. I've found argparse to be more straightforward and also has the benefit of being part of python itself.

kortex · on May 28, 2020

Click has one feature which is huge for me: parsing env variables following some pattern eg MYAPP_FOO.

For lazy import - You can do it with click, too, just put your imports in your entrypoint functions.

Click isn't minimalistic though. Nor is poetry or pyenv. Seems to be the main complaints about the article.

jabwork · on May 28, 2020

Configargparse is a great drop in replacement for argparse that I use to handle bridging env vars with cli flags

https://pypi.org/project/ConfigArgParse/

kortex · on May 28, 2020

YES I need this in my life! Tried searching for such a thing but I guess had the wrong keywords.

empthought · on May 28, 2020

I agree; click's approach is like some kind of old-school 4GL that tries to automatically create GUI elements from your database tables, except mapping CLI to functions in your modules. People should be putting enough thought into their CLI that Click doesn't really help them much.

alexhutcheson · on May 28, 2020

absl.flags[1] is another pretty good option.

[1] https://abseil.io/docs/python/guides/flags

nemetroid · on May 28, 2020

I have a really high usefulness threshold for adding external dependencies to Python projects. If you can get away with never getting into the virtual environment mess, distribution/installation/development becomes _so_ much simpler.

Of course, sometimes you can't avoid external dependencies (personally, this often involves pandas). But the standard library gets you really far. And even though urllib.request is clunky, I will only use Requests if something else already is forcing me to add external dependencies.

dahfizz · on May 28, 2020

I definitely agree for production-grade applications. Most of the time I'm using python, though, its to script a task I need to do or throw together a small app on a raspberry pi at home. In those cases, I have my own "standard library" of packages I always have installed system-wide, like requests.

Unless I am making a "real" application, I do my best to avoid virtualenvs altogether.

BiteCode_dev · on May 28, 2020

You don't have to. Use zipapps, this allows you to create a project with as many deps as you want, but use it as if it was a single python module.

It's an official feature (https://docs.python.org/fr/3/library/zipapp.html) that lets Python execute an entire project bundled into a zip file.

If you don't want to create it by hand, use shiv: https://pypi.org/project/shiv/

It's the best of both worlds.

dahfizz · on May 28, 2020

That's really cool! I can't believe I've never come across this, thanks for sharing!

BiteCode_dev · on May 28, 2020

You are not alone. __main__.py in a zip has been supported litterally since Python 2.6 but very few people knew about it. Eventually in 3.5, the zipapp module has been released to make the feature more discoverable, but again it has been mostly ignored.

I think the reason is that something like shiv was missing: it streamlines the process, shows a finish product instead of just telling people "all the thing they can do", and the automatic unzipping solved plenty of problems that you had with alternative like Pex, espacially on windows or with static resources.

So now you have it. Spread the word.

zdw · on May 28, 2020

Urllib and surrounding web-API modules have made such great strides (and are built in in 3.x) that Requests isn't needed in almost all cases these days - I find some of the error handling it covers over at lower levels to be more problematic than useful.

90% of the time the only lines in my requirements.txt file are for PyYAML and Jinja2.

Using just the Python standard library is the ideal case for scripts and other sysadmin-ey tools - no dependencies, runs with everywhere (tested in multiple python versions via tox) in a single source file.

russellallen · on May 28, 2020

Hypermodern Python seems to have a lot more steps to it than Ye Olde Python, where you put a hello.py file somewhere and did 'python hello.py'.

Yes, yes, I know. But...

luch · on May 28, 2020

I genuinely thought that was a satire post about over-engineering Python projects

ra · on May 28, 2020

I think it's a very enthusiastic post. Anything can be over engineered - and generally young enthusiastic programmers are keen to learn about the options.

abledon · on May 28, 2020

id be tempted to clone the structure of the posts, and replace the images with some stock photos of supermodels lying down on modernist furniture with cobras and other snakes wrapped around them or lying on the floor nearby.

lifeisstillgood · on May 28, 2020

Its not the language python, its the ... how to build a project that others can use. Its the scaffolding that 'just' knowing how to program does not teach you but working in a real environment forces you to learn.

Just skim-read but it seems to cover the sensible parts - all of this applies if you are writing 100,000 lines of code or 3 lines of hello.py

I am trying to write thedevmanual.com which is basically all of that - what it takes to run in real life. it is of course opinionated, and in lockdown :-(

Kovah · on May 28, 2020

I am not into Python, but the article lost me before it even started: you are required to install a bunch of compiler tools on your device to be able to proceed. Excuse me? Is this really what hypermodern Python looks like?

murgindrag · on May 28, 2020

No. Python is nice since it just works. You make a 10 line .py file, and it's super-simple.

This article lists a set of bleeding-edge tools, should you choose to add them and learn them all in one place. I wouldn't use half the tools (they're too hypermodern), but it's helpful to know where things are moving.

nilkn · on May 28, 2020

It’s been a long time since Python just worked. You can’t just hand someone a Python script and expect it to work on their machine. Yes, if you keep things simple and avoid all dependencies except the most trivial maybe you can get away with that. But that’s not going to be the case if you’ve got whole teams using Python.

wegs · on May 28, 2020

When I run teams doing Python development, I'm hyper-disciplined about avoiding unnecessary or bleeding-edge dependencies. My experience is that dependencies save time in the short term, but lead to exponential maintenance costs in the long term. I view each dependency the same way as technical debt.

I also generally don't lock versions on dev machines; code should use the core, supported API, and not break on bleeding-edge functionality and API changes. I lock version on deploy machines, obviously.

Smart people on my teams don't like this approach, though, so I could be wrong.

But you can do Python this way. And beginners definitely should start by doing Python this way.

nilkn · on May 28, 2020

I'm sympathetic about bleeding-edge dependencies, but just handling fairly mundane Python dependencies is really easy with venv, pip-tools, and good standards across projects. Of course you always have containers for actual deployments.

When you say beginniners, I think it depends on whether you're referring to programming neophytes in general or professional developers who are new to Python specifically. In the latter case, I actually think it's really important for newcomers to Python to get into best practices like this very early on -- indeed, pretty much immediately. Otherwise they're going to end up either being unable to use any interesting dependencies or being unable to distribute their work in a way that is easy and convenient for others to hack on. Do this with a bunch of people simultaneously and it's a big problem.

wegs · on May 28, 2020

We definitely use virtualenv and pip. But:

1) There's a world of difference between that and docker, and especially docker with containers for not just postgresql, but a half-dozen specialized data stores, queuing systems, MTAs, etc.

2) There's also a world of difference between having numpy / pandas / etc. in your requirements.txt, and having those pinned to a specific version. I'm okay with one or two pinned dependencies on any specific project (for example, if there's an overall project built on Django).

But if you're using the corners of standard libraries in ways where version 1.65 works and 1.73 doesn't, you're probably doing something wrong. You're probably using features which are too bleeding-edge. I'm okay with a few conditionals in code too (if library is 1.65, do X, and if it's 1.73, do Y).

When I've seen systems that to depend on nuances of specific versions, upgrades turn into "migration to [library] 1.73" and eat up weeks of developer time. It gets worse when you have cascades (upgrading library X means upgrading Y, etc.).

And goodness help you if you want to integrate two systems built in docker with pinned everything and fine-grained dependencies.

A lot of this also comes back to willing and able to say "no" to features which take 15 minutes to introduce, but cost time down the line to maintain.

Systems which install on Ubuntu without virtualenv or pip (just apt-get installing packages) are an ideal I strive for. It's usually one I don't hit (and it's also not how I develop, obviously -- it's not for me, but for my users, as well as for the discipline).

nilkn · on May 29, 2020

I can’t tell if we’re in agreement or disagreement. I don’t disagree that one should avoid exotic dependencies or unstable behaviors from specific versions of libraries. Version pinning is more about just making sure someone else can run the program. It’s not about (or at least shouldn’t be about) creating a reliance on odd corner case behaviors. We almost never manually pin versions — pip-tools does that automatically.

Your argument probably would be that the dependencies used should be so simple and core that the risk of it not working with someone else’s package set should be minimal or zero. That’s just a bit too extreme for my taste. I want builds to be 100% reproducible. This is exactly what modern build tools for other languages do.

Re: Docker, I don’t think anyone is claiming pip and virtualenv are somehow a replacement for that.

Re: apt-get, we tend to actually avoid this. It’s really not a good package manager at all and can easily break. We’re going in the direction of nix instead and may even port our entire Python workflow over to it or bazel at some point.

wegs · on May 29, 2020

There's a spectrum of agreement.

(1) I want builds to be 100% reproducible on deployment servers and on CI/CD pipelines. Otherwise, you can undebuggable Heisenbugs. On the other hand, I don't want builds to be reproducible between developer machines. If I'm running Python 3.6 on Ubuntu, and another developer is running Python 3.7 on a Mac, and we have slightly different versions of numpy, that makes sure the system is not too brittle. Come to think of it, if I had infinite resources, I'd have several build machines with different (reproducible) configurations.

(2) I'm a lot more spartan about dependencies than other developers I've met.

(3) I'd never use apt to manage Python packages myself in something I'm working on. The constraint is in the other direction. If I build a tool, a user ought to be able to install it using apt in some future version of Debian, and likewise for other systems. Even if that's an abstract user.

I've found that if I develop this way, the upsides outweigh the downsides, especially over extended periods. A lot of software gets built like a system which can only live in one places. There's a set of AWS machines, code on them, and that's the system. There might be a few copies of it (stage+dev+etc.), but you can't move it somewhere else. I like systems I build to be portable. Someone can bring them up-and-running on their own machine, ideally in a few minutes. I've always found that to be cheaper, in the long term.

nilkn · on May 29, 2020

I feel like your idea in (1) is not that unachievable with finite resources. It depends how far you took it but requiring tests to pass in a few mild perturbations of the target environment wouldn’t be that expensive in a lot of cases and not even that hard to set up. Sounds like this deserves a name like “perturbative testing” to me if it doesn’t already have one.

wegs · on May 30, 2020

In abstract, it doesn't take a huge amount of time and resources to do that. But, there are probably around a hundred higher-priority ideas achievable with the same resources which would take priority over this on the projects I'm working on right now.

On projects I've worked on before, I think this would have made sense /technically/, given project priorities, but so did many other things which weren't done. It's a lot easier to make the case for resources for customer-facing features than for technical debt or infrastructure. So there's the political component too, which varies organization-by-organization.

This is already done in a lot of projects with hardware. The Linux kernel will run on a thousand hardware and software configurations before integrating features.

If I did this, I'd probably want at least three builds:

* my pinned deployment versions (sometimes a release or two behind, sometimes bleeding-edge)

* latest released version; and

* HEAD

If an upstream project is introducing a breaking change, I'd know immediately. That'd be super-helpful, probably both to me and to those projects.

Come to think of it, the right way to do this might be to have three virtualenvs on my local machine, rather than just different targets in CI/CD....

tyrion · on May 28, 2020

Those are needed to compile different versions of Python with pyenv.

Usually distributions bundle one or two specific versions of Python. Pyenv makes it super easy to install and use all the versions of Python that you want.

Even though for most people it might be enough to just use whatever version of Python comes installed with your system, for a team it might be important that everyone has the exact same version.

Moreover, pyenv-virtualenv makes it painless to use virtualenvs and so I recommend you give pyenv a try even if you do not need additional Python versions.

simiones · on May 28, 2020

Again, do these tools really require a C compiler toolchain? If so, they are completely out of the question - they add much more complexity then they could possibly fix.

Why not just install the required version of Python, maybe from a 3rd party repo if not available in the main repos? Why would I ever want to get a compiler toolchain to get my Python interpreter?

tyrion · on May 28, 2020

Could you explain why it is such a deal-breaker in your opinion?

It is a single `apt-get install` (which probably downloads less MBs than our typical `npm/yarn` install).

Consider that if pyenv came pre-packaged for ubuntu/debian, those packages would be runtime dependencies and then you would just need to `apt-get install pyenv`.

simiones · on May 28, 2020

What if I already have a different C toolchain installed? Would I now need "cenv" to keep the pyenv C toolchain and my other C toolchain separate?

starlust2 · on May 28, 2020

Pyenv often is the easiest 3rd party source for additional pythons.

Real scenario I'm in right now:

- Need to build libraries to support 2.7, 3.6, plus future proofing for >=3.7

- CI workers have images which only python 2.7 and 3.6, platform team too busy to update them

- Mac Homebrew only supports >=3.7

There's no common environment between anything unless we use pyenv

simiones · on May 28, 2020

Can't you simply download Python 2.7, 3.6 and 3.7+ to separate folders, and use those in your different projects?

I understand that virtualenv and maybe even pyenv are useful if you need different requirements for different projects using the same version of Python, as apparently pip installs packages globally. But for your setup, I don't get why something like pyenv really helps...

frio · on May 28, 2020

> Can't you simply download Python 2.7, 3.6 and 3.7+ to separate folders, and use those in your different projects?

Yep, most people can just do this. Then, they want a little script over the top that downloads the different versions of Python for them -- just to make life easier for them. Wouldn't it be handy to also script the installation? It'd also probably be useful to automatically setup the version of Python I want to use when I switch folders, so I'm not constantly running the wrong version when I change projects.

... et voila, we've reinvented pyenv :).

starlust2 · on May 28, 2020

I gave up trying to install python 3.6 natively on my mac after a few hours. If there is a way it can be "simply" be done, I did not find it.

neurostimulant · on May 28, 2020

If you don't need to use multiple version of python like me, then you don't need it. Just use the built-in venv module in python 3 to create and manage virtuanenv. for example, simply run `python -m venv .venv` to create a new virtualenv in your project dir, and vscode will automatically recognize it when you add the project dir into your workspace.

But having c compiler toolchain available is pretty much standard practice when using python (or nodejs, ruby, etc) because you might need to install some library that requires compilation from source (e.g. psycopg2). If you don't want that, you'll stuck with installing those 3rd party libs from your distro's repo, which might be out of date or outright missing (especially for less popular packages).

Chiron1991 · on May 28, 2020

> Why not just install the required version of Python, maybe from a 3rd party repo

That's the thing. pyenv just downloads the source code of the specified Python version, and then compiles it on your machine. They did that to be agnostic of the OS you're running on.

simiones · on May 28, 2020

But they are not agnostic of the OS. In fact, they have a huge dependency on the C toolchain, which I have to fix apparently.

metalforever · on May 28, 2020

My workplace got burned recently using this tool for some deployment reasons . I have to be honest , my experience with it in other ways wasn’t ideal. I won’t be using it again.

traverseda · on May 28, 2020

Hypermodern also means "bleeding edge, kind of broken". Modern python does not look like that.

BiteCode_dev · on May 28, 2020

No, and there is nothing hyper modern about it either.

It's a Python stack that solves some problems for a part of the community.

SV_BubbleTime · on May 28, 2020

I think your take is like mine.

It’s a failure of the author to explain intent. The entire article is WHAT, not WHY.

If you are already hyper-familiar with python, you know what you are looking at. Yes, of course I’ll want to use pyenv, “everyone knows that”. For the rest of us, it just seems like a lot of steps to do - for some reason.

I think it’s different from just not being the target audience. A good tutorial explains intent.

languagehacker · on May 28, 2020

Great read! I see a few approaches I'd be interested in adopting. Nox sounds particularly useful for tooling consistency across development environments.

It's worth noting that this a rather opinionated toolset, and that we shouldn't mistake opinionated for hypermodern. You could replace poetry for pipenv (particularly now that it's being maintained again...), and I tend to prefer unittest to pytest. The further into it you get, the more opinionated it is -- for instance, code coverage support and CI/CD is not one-size-fits-all, but the choices of CodeCov and GitHub Actions are nicely illustrative.

fastball · on May 28, 2020

Please no pipenv. A horribly managed project with worse design choices than Poetry. And a confusing name to boot (it makes it sound like a clean bridge between pip and pyenv, but it's definitely not). On top of everything you can throw in Ken Reitz's ego. Literally the only reason pipenv gained traction is because there are still an unnerving number of people in the Python community that think Ken Reitz shits gold, when he is clearly more of a one-hit-wonder (love requests btw).

Even the testimonials are self-important.

> Pipenv is finally an abstraction meant to engage the mind instead of merely the filesystem.

Like, really? I want to interpret that as a joke, but I don't think it is.

Poetry should be the future of Python dependency management. I'd like to forget pipenv ever happened. Unfortunately since PyPA took pipenv under its wing, that might not happen.

nsteel · on May 28, 2020

I often end up frustrated when jumping back from pytest to old projects that are still using unitest I really like pytests's fixtures and it's caplog stuff. Out of interest, what do you prefer in unittest?

darkerside · on May 28, 2020

Wouldn't hyper modern Python rely on containerization for isolation and portability? Poetry and pyenv seem like incremental improvements rather than a qualitative leap forward.

jacobush · on May 28, 2020

Heh, but that's one modern thing I don't like. Someone complained about click adding to startup time, running a new container is certainly "modern" but tiring.

darkerside · on May 28, 2020

It's certainly overkill for a one man side project. But if you have a team and deploy to clustered servers, virtual environments sure start looking like a horse and buggy.