I really wonder whether there is really any point in writing shell scripts anymore. Practically every Unix/Linux box in existence has at least some version of Python 2 that can be used as a complete and total replacement. I can't think of a single situation where I would need a shell script and a Python script wouldn't be much cleaner, simpler, and more maintainable.
Shell is required to be basically everywhere. It's part of POSIX. It's also standardized. Shell scripts written 30 years ago still work, and will probably work 30 years from now too. Shell is probably on your TV.
Not everyone has Python. The most common version in the field is Python2, but that's officially obsolete. Python3 is in many places, but not everywhere. The two Pythons are basically incompatible unless you're willing to put in an effort. You have to work to make Python scripts compatible with both versions, especially if you can only use built-ins, and the results are way clunkier than that 2-line shell script you're trying to replace.
Python is a terrible choice if your goal is a relatively short script that calls other programs. You can call out to other programs in Python, but it's much clunkier and more painful to maintain. And while it often doesn't matter (because neither are speedy), CPython 3 takes more than 50x more time to start than dash and 27x time to start than bash (see: https://github.com/bdrung/startup-time , which shows
Python3 3.6.4 at 197.79 ms, Bash 4.4.12 at 7.31 ms, and dash 0.5.8 at 2.81 ms). That speed is helpful when you want to have a "simple script in a loop".
Python is much better than shell when you start needing more sophisticated data types, libraries, modularity, etc. But the typical use for a shell script doesn't need any of that. If you need that, shell is a terrible tool, because it's the wrong tool for the job, not because it's never useful.
If you're writing shell, use shellcheck. Once you do that, your error likelihood goes way down. Many of the reports of problems writing shell come from a time when shellcheck didn't exist.
POSIX shell being fixed forever though also means it will /never/ get "fixed".
Shell's silent-failure-by-default, implicit-by-default, principle-of-most-surprise semantics will always be there, and they are /bad/.
I have read lots of shell scripts, by many people, from the experienced UNIX greybeard to to the novice web developer, and they were almost all flawed. Pointing out the flaws, the devlopers of each level were surprised by them. It is close to impossible to write correct shell script, no matter the level of expertise, because the language offers neither semantics nor tooling that foster correctness.
Forgetting `set -e`, forgetting that it gets turned back off in subshells, quoting mistakes, pipefail semantics, you have to have made many mistakes to even /discover/ that these pitfalls exist (or use shellcheck, which will find some, but certainly not all of them).
Shell is a language that looks simple on the surface, but has 1000s of exceptions you have to master to write even minimal non-bugged scripts.
The lack of any form of reasonable data structures and strings being the prevalent data types also do not help. For example, NixOS's linker wrapper script (written in bash) was accidentally quadratic because of quadratic append-to-strings (https://github.com/NixOS/nixpkgs/issues/27609); and that was despite NixOS having many competent bash programmers. When scripts grow, these things happen.(Note that shellcheck does /not/ point out quadratic string appends; it can only find obvious mistakes and this is hard to detect in general). Fixing it was very difficult, and was not understood by many. If you have a language like Python that offers reasonable set/dict data structures that people understand and use intuitively, this is trivially avoided.
Finally, POSIX does not help much when the main thing shell does is calling other programs, and there are no guarantees on what the APIs for those programs are. For example, I have encountered many shell scripts that themselves are technically "portable", but they use arguments to e.g. `rm` that do not work on e.g. busybox, thus crashing the boot processe and resulting in hours of time lost for hundreds of people.
By not using shell, we can support that the next SOMETHING-standard will NOT mandate shell, but something better.
A Python script is not the answer when I'm trying to do something quick and dirty at the shell.
The beauty of shell scripting is that it evolves seamlessly from trying to solve simple problems at my command prompt. I pipe a couple of things together, and then I realize I could use a loop and a few conditionals, and suddenly it makes sense to store this in file in case I want to do this again. Boom, program done.
It now works on every computer I own.
I love Python. I use it almost every day. But I'll be damned if I'm going to go rewrite every shell snippet I hack together into Python just because it exists.
Your observation about the seamless evolution of a basic shell script from a few lines typed interactively is very insightful.
This kind of automation adds a lot of value for low marginal effort and probably explains a lot of the short scripts I have laying around in directories.
True that. But everyone should keep in mind that script bloat is real. If you're not careful you'll end up like me, and accidentally have to maintain what is essentially npm but written entirely in bash.
I've primarily used Python 2 for 10+ years and I often find cases where shell scripts are preferable.
The major differentiator is usually "shelling out" in Python kind of sucks. It's verbose, output collection and error handling suck, and escaping can be miserable. I often will reimplement things in pure Python if I have the time.
A recent example was I needed to tar+split large files. `tar cf - -C / $filename | split --bytes ${size}MB --verbose - $tmpfile.` My pure-Python implementation used the Tarfile library and I wrote a custom file-like object to split it--at least 50 lines. Both methods have pros/cons, but I had both implementations handy depending on the context I needed it in.
Another recent example was something I wrote to merge multiple video file "parts" into a single video using ffmpeg. It's 4 lines of shell script and would at least be 3x as long to write in Python.
The rule of thumb I have is if it's longer than 20 lines reevaluate if a shell scripting language is appropriate and I haven't really seen a need to change that in ~15 years I've used it.
Yeah, most of that comes from the verbose process needed to invoke a process, right? That's something I noticed when going back and forth between PowerShell and C# - that if C# had clean support for invoking a process and collecting the results as an IEnumerable like PowerShell does, PS wouldn't really need to exist, since 90% of the time you're dropping into C#/.net objects to get anything done anyways.
I actually invested some time a while back in building a nicer API for C# to invoke shell commands and process the results. The only downside IMO is the Rx library dependency for STDOUT/STDERR; I personally try to avoid depending on libraries which themselves have extra dependencies.
Since I did this at work it belongs to my employer, so I can't currently publish it freely, but it's not part of our core product so they may be open to publishing it under MIT licence or similar at some stage.
So it can be done, and has been done, but I guess most people are sufficiently happy with Powershell, Python, etc to not bother bridging the gap for C# too.
> Yeah, most of that comes from the verbose process needed to invoke a process, right?
I think it's more incongruence between the languages. os.system() will technically call a command. subprocess is a big step up from popen2, but Perl was much more streamlined and terse. That also applies to one-liners when you're in a shell.
With Python you just have differences in a bunch of things; error handling? I often have to look up error codes, catch the exception then check for error codes. Pipe data in? Test for sys.stdin.isatty() then read from sys.stdin or fallback to sys.argv--it's not obscure, just not particularly Pythonic. The list goes on and on. It will probably take ~20 lines to properly deal with shelling out and if you do things wrong you could deadlock[1]. On the plus side, discouraging shelling out means your code is more portable =P
> Practically every Unix/Linux box in existence has at least some version of Python
If we understand "box" as "environment", in general, and not literally a physical or virtual machine...
Not really. E.g. see: every clean Ubuntu Docker container. They don't have Python, nor should have to include it by default.
And if you are going to do something like this:
apt-get update && apt-get install -y stuff
wget https://file.tgz
tar xf file.tgz
cd file
make
make install
value="$(cat result | grep something)"
mvn clean install -Dapp.value="$value"
Yeah, this would need proper securing and Shellcheck, but still I'd rather have it written like this, a concise and to the point script, instead of having to install a whole secondary language interpreter and writing a script among 3 to 10 times longer for the same result.
I recently did the opposite of what everybody seems to be doing: moved an 800-line Python script, which you had to read carefully to understand what it was doing, to a 150-line Bash script that you only have to look at, because the important things to know is to see what other, external programs are being called. Just the apt-get stuff in Python was a mess with the poorly documented Python API for Apt, and some 50 lines to just do the same of an apt-get install... effectively wasted mental cycles.
I've yet to find a programming language that makes I/O redirection, piping, and process substitution[1] as easy as Bash does. Process substitution is where the shell really shines, in my opinion.
Bash, and Bash-like shells, are literally everywhere. I have to be wary about what Python 3 features I use, and if there will even be a Python interpreter available. My OpenWRT router has a Bash shell, but I don't care to install and maintain other interpreters or runtimes on an embedded platform.
Even job management and parallelism are easier in Bash and GNU Parallel.
I don't know about process substitution, but Dart makes connecting processes together pretty easy and you can do way more fancy things than Bash let's you do, in a clean fast language with basically no gotchas.
If your OpenWRT router has a bash shell, then you optionally installed it from packages. The default shell is Busybox, which is a minimal shell that supports few if any "bashisms." In other words, you certainly do have to be wary about what shell features you use.
PowerShell, mostly because PowerShell was designed as a mashup of Bash and C#.... And it's kind of a trainwreck in a lot of ways.
It really feels like piping and easy process invocation and compile-time directory awareness wouldn't be massively onerous to add to an existing full-featured programming language so you wouldn't have to sacrifice a good type system and powerful syntax when you want to do scripty things.
PowerShell has real data structures (arrays and hash tables), built-in functional programming tools (Select-Object, ForEach-Object, Where-Object), GUI cmdlets (Out-GridView) and direct access to .NET libraries. It’s a viable tool for many complex tasks where on Unix, you’d need to reach for Python rather than a shell.
Some of the features of PowerShell seem really enticing, and I wish they would make their way over to Unix-like shells. Unfortunately, I don't know how much utility I'd get out of PowerShell on Linux, so I haven't tried it.
edit: your comment inspired me so I installed PowerShell via Snap and am going to give it a go on Linux.
To me PowerShell mixes easy access to command-line tools and higher-order object-oriented map/filter style functions. The mix between Unix-style piping from command-line tools and modern object-y mapping and filtering can feel like a dream.
That's the good part. The bad part is that it's easily one of the wartiest languages I've ever used, especially as a young language it's really got an inexcusable amount of legacy problems.
A colleague of mine (who also uses it heavily) says "two Googles per line".
Just to get sane parameter and variable checking you have to throw up a bunch of attributes and set a bunch of flags.
Be sure to set up strict mode, use cmdletbinding on your parameter blocks, and set ErrorActionPreference to "stop" in every script you write.
That and the quality of many of the first-party official PowerShell modules is appallingly low by Microsoft standards. The good side is that the .NET framework is accessible so whenever the PowerShell module is failing you, you can drop down into leveraging raw c#-style assemblies, which are much higher quality but they speak PowerShell with a very thick accent.
But I'm using it on Windows, YMMV on Linux.
I'm infatuated with the concept, but the implementation leaves a lot to be desired.
Shell scripts are readable by just about anyone, they're available on every UNIX system, not just the Red Hat/Debian-derivatives of the last twenty years, they're fast as long as you're not doing stupid things, they're easily maintainable, they don't handle dependencies terribly (unlike Python), and so forth.
There's a reason AT&T used to run ads that showed their secretaries, managers, and so on using and writing shell scripts and there's never been a Python ad claiming that just anyone could write it.
I don't think that's a far comment. It was a different era with different expectations about computer user. The equivalent these days would probably be Excel macros
Python makes sense for scripts that need good argument parsing, or complicated intermediate input processing. But it's pretty annoying to get a shell pipeline working in Python. `foo | bar | baz` is about 15 characters in bash.
Correct me if my first impression was incorrect, but scanning through your scripts they all seemed like sub-kilobyte one-liners. Which, yeah, just use Shell for those and you'll be fine.
To my mind, the parent post was referring to lengthier, more complicated scripts. Speaking as one who wrote and routinely maintained such scripts at one point, I cannot agree more with the sentiment that most if not all of them could and should have been written in Python or Ruby or some other scripting language instead.
Looks like a lot of your scripts couldbe aliases, I store mine in ~/.aliases, and slightly more complicated things (e.g. take arguments) in ~/.functions, and source both those files from bashrc.
I have maybe 3 standalone shell scripts in my PATH, despite writing thousands of lines of shell.
I've moved from aliases to scripts as I can't seem to have found the (if any exists...) way to ensure aliases can be used from within vim's ":!..." or ":'<,'>!...", and as I often find myself wanting to either execute something while I'm amending code, or wanting to filter text with a purpose-built command, I end up writing (sometimes very short!) shell scripts.
I used to do that with a help of Fabric/Invoke, and it was a pleasant experience overall, but sometimes required too much of verbosity. Then I discovered Plumbum (https://plumbum.readthedocs.io/en/latest/) with its concept of combinators, that I liked a lot more.
Eventually it motivated me to switch the language for shell scripting for the second time, and nowadays I recommend Haskell's Turtle (https://hackage.haskell.org/package/turtle-1.5.16/docs/Turtl...) to anyone who is interested in safer shell scripting and still likes a concise and terse syntax (and it's blazingly fast too).
The shell is the true user interface to the OS. I like to think of writing shell scripts as having some mouse clicking automation tool on a GUI shell. It should be viewed as just as kludgy of a solution. But there really isn't anything more convenient most of the time. Shell scripts are to be called by humans to automate behavior they otherwise would have had to do manually.
An application should never rely on a shell script. If you have to execute another program, use the syscalls (fork and exec on posix).
> An application should never rely on a shell script. If you have to execute another program, use the syscalls (fork and exec on posix).
Could you explain why? I'm very happy with programs calling shell commands/scripts. You execute another program the same way whether interactively from a shell, or by calling system() from another program. The simplicity and universality of the call syntax is an advantage.
Security. system() is one of the most common targets for hacking (getting a shell by manipulating the string passed to system() by various means). Calling programs from the kernel directly is a lot more well-behaved. You're limited to only executing one program.
I love python, and half my ‘shell’ scripts are python, but the mere fact that I have to import something to exec something or read a file means that bash is easier to build and test one line at a time. And good bash pipelined one-liners are a reason I’ll never write all of my scripts in python. It’d be hard for me to calculate how often I use constructs like ‘grep | sort | uniq | cut’. That takes a lot of python code.
I still think for very small stuff it's easier to just write a shell script. For instance a docker entry point, where you set a few env vars, download some required files and start your main application.
It’s a real pain to handle subprocesses in Python. If you need to automate certain command based workflows, Bash scripts are much easier, both to write and read, until they reach certain size. At my previous job it was a daily task and scripts involved tricky stuff related with Subversion, Git and builds (complex CI/CD, generally).
The problem is that very few people know shell scripting well, but once you get to know it, it’s not that bad, in quite specific cases.
You don't have to pick between the two. Shell scripts are great and give you the option of using battle tested commands that have stood the test of time.
When you don't need to do any actual logic and just want to run a sequence of commands that you've been typing in by hand, and you already know the commands, and you don't want to go look up the Python versions of them all.
IMO, the power of shell is really the ability to leverage command-line tools (e.g. git, curl, jq) with very little code. These tools are very fast, well-tested, feature-rich, and often easy to install in a reproducable way.
IMO, The inflection point where you should stop using shell is very low, though. The Google style guide for shell recommends that you should rewrite scripts once they are more than 100 lines, and I think that it probably too generous.
git extensions are an unfortunate counter-example. git relies heavily on shell scripts, and it even has a shell "library" [1].
I wanted to create a custom git command and started with Rust/libgit2, but found it was missing too much. Shell scripting proved to be the most natural, best supported approach (though it was nevertheless very painful).
The problem with Python vs shell scripts is that Python is a bloated mess of a platform. Python tries to be all things to all people, through a messy package system that isn't cross compatible with different platforms. bin/sh|bash is 1/100th the size of python and rock-solid.
I think the main issue is: just because someone gets something to work on their system with their Python install, they think it works everywhere. And that is not at all true. There's a reason bin/sh|bash are so pervasive in administration, they are small, simple, well-understood, and resistant to the kind of bloat the Python chokes on.