This article is written in kind of a controversial way but it seems like the throughline of the argument is something like "use heroku until you have 100k users".
This seems very reasonable to me. I thought it was going to be a pitch for on prem, which is also fine for certain scales.
I think generally the scaling steps from startup to megacorp go:
Heroku/Dokku > Public Cloud >Dedicated servers in someone else's DC > Custom Hardware in custom built data centers.
Each makes sense at each scale. I find it to be more of a right tool for the job consideration than one being better than the other.
With modern cloud tooling your infra can also look more or less logically the same once you grow past the heroku level.
> Of course this is function of what you're optimizing for, and whether you want to go down the "boring monolithic app" route.
Microservices do add some overhead but it's not extreme - even a microservices-based app can run just fine on a decent bare-metal box if you run all the services on it.
Of course, the talk of microservices brings the question of what problem you're actually trying to solve - are you aiming to build a technical solution to a business problem or are you aiming to create an engineering playground so there's endless busywork and justification for hiring lots of engineers? If it's the latter, then bare-metal is going to be a bad option anyway as it's not the kind of toy a typical startup engineer wants to play with.
I really agree with you, what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud.
There's also something to be said for buying a VPS or a Colo machine, making sure it's backed up and dealing with the 9's that you get from that machine on it's own. I am routinely surprised by how far a single node machine will get you.
> what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud
It costs a lot of money to run your own datacenters, and very very few companies are capable of doing it as good as AWS or even Scaleway/OVH can. By that I mean, waiting weeks/months to get through tickets, approvals, multiple different teams just to get a server deployed. Then waiting a few more weeks for monitoring/backups.
Allowing developers and related to have hardware/software at a whim is a massive advantage.
If you open a ticket with a real remote hands you should get a response back in minutes; typically someone will be on-site in your cage in under an hour. You also don’t “deploy a server” to reduce load you plan ahead every few months and deploy thousands at a time. Even then- if you have a good relationship with a really good systems integrator, they can ship and rack machines in a matter of days, not weeks.
I’m increasingly convinced that the large scale companies that don’t do bare-metal because they never learned how, and all the people advising them have never done bare metal or have done it poorly, so it’s like the blind leading the blind.. But they are leaving a 50-90% cost savings, better control over reliability, latency, data residency, etc on the table by doing so.
> If you open a ticket with a real remote hands you should get a response back in minutes; typically someone will be on-site in your cage in under an hour
Remote hands won't order your servers, configure your networking, install OSes/configure your PXE, and all the other tedious things running your own DC entails.
Yes, most DIY DCs are done terribly, that's to whole point - if so many people struggle with that, doesn't it make sense to just outsource it?
They will do whatever is in the contract. Yes, they will hook up a crash cart and PXE the server (If you need that), done it hundreds of times in the old days. However in modern datacenters no one "installs OSes and configures the network". You plug it in, turn it on, everything self-provisions and starts serving traffic.
> However in modern datacenters no one "installs OSes and configures the network". You plug it in, turn it on, everything self-provisions and starts serving traffic.
Absolutely agreed. That's what i used to in part do, and it's a massive effort to do everything automatically and efficiently, and it needs multiple people's time to create and maintain all the infrastructure, glue between different systems, scripts, tools. Even components as basic as DHCP suck absolutely (your options are either something from the 1990s, isc-dhcp-server, which lacks an API in any real sense, or Kea, made from the same people, which really shows), and before Tinkerbell there was literally nothing that could be used to automate such a thing at scale.
And more to my point, how many datacenters do you think are "modern"? I've only encountered one that was starting to get there (where i used to work before an acquisition by a company with arcane practices in their DCs), and having worked with hundreds of customers with "on prem" stuff, for the vast majority it's a legacy horror show.
I'd agree that unless you can really profit from having very specific hardware, you're better off renting dedicated servers than colocating servers you own. Have somebody else worry about having people on call to switch out failed hard drives.
Comparing either to AWS will inevitably lead to a much more complex discussion about spot instances, traffic costs, ancillary services etc.
I don't like the idea that the only way to get developers moving is to use cloud, but I agree that it's a solid replacement for really bad ops.
What I've seen in many places is an abstraction over bare metal, some are better than others, openstack, Kubernetes on-prem, vmware etc; are all solutions that have differing amounts of adoption. Ubisoft had a lot of stuff in this area, as does Google. Ubisofts was pretty terrible though.
If you need a physical machine to be deployed, you've hit a certain level of scale and your load is much more known: and even though it can take a few weeks, what you get back is quite competitive.
But if you're waiting for hardware to get anything moving in the first place then that's obviously bad.
What I've taken to doing is prototyping on Google Cloud and then planning to migrate things to on-prem once everything is reaching maturity.
It also lets a CFO convert capex to opex (may or may not have tax implications), and you eliminate a cost center from your balance sheets (and turns it into service payments) which makes CFOs look better, even if it's net worse for the Company
USA. So we have weird tax rules. Also I dunno the CFO could have been drinking the sales teams coolaid (we were also trying to sell enterprise to move to the cloud). And there was a lot of drinking (and hookers and probably blow) at that company
> It costs a lot of money to run your own datacenters,
If you actually do the math it is pretty much a wash vs using AWS. Yes you will pay a lot more upfront, but over a 5 year period (standard warranty length, and typical deprecation time) it pretty much evens out compared to AWS. I am sure there are many uses cases where on-prem would actually be cheaper than AWS over 5 years.
At the companies I work for the red-tape isn't nearly as bad as you make it seem (or have perhaps experienced at places you have worked). The biggest time sink right now is the ongoing supply chain issues and vendors just not having equipment, the approvals/tickets are pretty quick where I work.
If you have a little spare capacity, developers can still get hardware/software on a whim, at just a (comparatively) small one-time expense. Spare capacity is much cheaper than people make it out to be.
The upside is that it's much cheaper once you're at the scale where you no longer need to variableise your compute costs, but can tank the up-front fixed costs and do proper capacity planning.
> If you have a little spare capacity, developers can still get hardware/software on a whim, at just a (comparatively) small one-time expense. Spare capacity is much cheaper than people make it out to be.
Those people are speaking from greater experience - there are many things which seem easy but aren’t once you’re over a certain scale, and at large organizations you often have things like conflicting policies or coordinated demand (e.g. your slack capacity disappears when every project is trying to hit the same budget deadline or a change moratorium ends, a pipe breaks in building A and you need to shift a ton of previously-stable systems for 6 months, etc.).
You can do that kind of capacity planning well but it’s harder than it looks and often politically challenging because the benefits aren’t obvious. Cutting corners looks like saving money right up until it doesn’t. If you aren’t buying servers by the hundred or storage by the petabyte, you are unlikely to be competitive with a cloud service without sacrificing multiple of performance, reliability, timeliness, and security.
I think you and I are saying basically the same thing, only putting emphasis on different aspects due to our various historic experiences.
Look at your demand pattern (variable or stable, predictable or unpredictable) and what cost structures your finances can support (variable or fixed, up-front or as-you-go), pick a solution based on that, not what's cool.
Also adding your barriers to entry: staff, facilities, process, etc. It doesn’t help you saving on servers over a cloud provider if your procurement process means you have people sitting idle for 6 months.
This. I took the wrong lesson from the DDoS attacks on Linode in late 2015 (particularly the one on Christmas Day), and the intermittent issues I encountered with DigitalOcean and Vultr in 2016 while both providers were still fairly young. A single dedicated server from a mature provider (ideally not during its hyper-growth phase) is pretty reliable.
> what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud.
Why is it surprising? Building and maintaining custom data centers is a big, slow business initiative. It takes months to years of forecasting to get the data center buildout to match the business needs, as opposed to the extreme flexibility of using a cloud provider.
> There's also something to be said for buying a VPS or a Colo machine, making sure it's backed up and dealing with the 9's that you get from that machine on it's own. I am routinely surprised by how far a single node machine will get you.
For personal projects this is exactly what I do. It’s great until something goes wrong with that one machine or VPS.
But it’s not really a good option for any business that needs consistent operations and uptime. Years ago I worked at a company that tried to self-host some of their collaboration tools on a VPS to save money over the cloud-hosted versions. When the server went down it stalled productivity for a day while the team restored a backup, with another week of confusion as we tried to find all of the things that were lost between the last backup and when the server went down.
When someone did the rough estimations on how much it cost to pay everyone’s salaries for that day of lost productivity, the number was far higher than the trivial cost savings we got from self-hosting. We also had a constant background burden on someone internally to maintain and monitor the server, plus the burden of them being on call. Often, moving to cloud anything can be a huge load off the company’s back.
> as opposed to the extreme flexibility of using a cloud provider.
I don't really buy this honestly.
What you buy with cloud providers is quality tooling, not flexibility.
If you're bin-packing with Kubernetes properly then capacity is capacity and it doesn't matter if the marketing department are using it or the developers are. You just buy a bunch of servers and when you see the load approaching 70% you buy more. It's a 2 person job.
Is it harder? Yes. Definitely.
Is it a panacea? No. Not at all.
Is it universally cheaper? Also no. Definitely not.
I feel like whenever I talk about the Cloud as an expensive thing that people get emotionally defensive.
I'm not here to take your toys away.
Services like cloud are just tools and tools always have pros and cons.
If you can't reasonably discuss the con's without resorting to "I need to hire more staff" or "its a lot better than $strawman" then we're just cargo culting.
> it’s not really a good option for any business that needs consistent operations and uptime.
Most business cases for computers can just eat the downtime honestly. Your URL redirector doesn't need 5 9's. There's a grading scale of complexity and uptime, on one side you have a single hosted server that has profoundly strong uptime (especially with the redundancies in normal servers); then you start adding complexity to get HA, and weirdly: the complexity lowers the reliability.
If you keep following the line of redundancies and HA complexity, eventually you can get to a point where the service is even more reliable than a single node. Which is what everyone assumes they will get straight away, but usually it's a lot of work to get there.
> the trivial cost savings
This will differ a lot.
I made two games, one was hybrid-cloud and one was bare metal only; the cost savings were not trivial. If we had 100% clouded the hybrid deployment we would easily have paid 10x in the hosting costs which would have been enough money to pay for 250 contractors at a premium rate.
Cloud also offers flexibility in buying equipment "just in time" and returning it when projects are cancelled. You don't need 6-18 month lead time to acquire and install hardware which then gets wasted if the project gets cancelled or rescoped. Having massive capex projects converted to opex is very appealing for a lot of businesses
As a software engineer who doesn't really like devops and has been in this position multiple times, I'm a huge fan of buying à la carte services from different providers that specialize in managing a specific type of service (often since they are the developer/maintainer of said service). As long as you make sure they are all in the same datacenter, you still get great performance. And typically minimal configuration woes.
For example:
datacenter - aws: us-east-2
Dockerized Webservers/task servers:
Render or Engineyard
Postgres & Kafka:
aiven or 84codes
Redis:
Redis labs
Unified logging
Elastic or Grafana
I still end up using some underlying AWS services like S3 and lambda, but it's a lot less work than managing an entire AWS ecosystem with security groups/VPC/networking etc.
That can work well but you have to be careful/mindful of egress charges and latency (if the server supports co-locating in the same cloud or even managing directly in your account, it can alleviate those issues)
Thing is that for most/many startups 100k users is not a lot. Rejiggling your basic infra just as your growth is starting to accelerate is a non-trivial task, a risk, and something that doesn't fundamentally move the needle.
This really depends on what you are building of course. Working in enterprise SaaS with only a few users per account? You'll be doing really well at 100k users.
> Thing is that for most/many startups 100k users is not a lot.
Depends; if you're a startup offering free or ad-supported services and the exit plan is "be bought out by existing entrenched competitor", then, yes, 100k users is not enough to hit your goals.
If you're a startup offering B2B services, even 10k users is enough to be madly profitable.
I don't think this is a good advice in general and I haven't seen many companies to do it practice. The initial decisions which a company makes most of the time are hard to change. Moving a successful business with clients from one platform to another 3 or 4 times is very difficult to do or hardly impossible.
It may work for a simple website, but for any more complicated project with web clients, mobile clients third party integrations, migrating from Heroku to cloud provider to on prem means refactoring big parts of the project.
What is even bigger problem a migration like this is hard to do incrementally.
I don't see the point of public cloud. In practice, it still requires a sysadmin (now called "DevOps engineers") so it's not any better than rented bare-metal in terms of maintenance overhead, while still being extremely expensive.
Use a managed PaaS to begin with (you pay more but it does genuinely save you time as there is no management overhead), then when you're ready to do things yourself go straight to hosted bare-metal, and only use public cloud services for their managed services that you can't replicate yourself (think Redshift/Athena/Aurora/etc).
> so it's not any better than rented bare-metal in terms of maintenance overhead
In my experience the maintenance overhead of the cloud is much lower. My dayjob (B2B SaaS) spent about 75% of the infrastructure team’s time on things like patching switch firmware, balancing UPS loads, diagnosing flaky switch ports or transceivers, managing logging growth, etc. None of that made our products better from a customer perspective.
Since our cloud move those same infra staff support many more services and apps with much faster turnaround for product teams. And we traded upcoming multi-million capex investments in servers/switches/appliances into a monthly cloud bill that scales much more closely with revenue.
The public cloud is for businesses constrained by people; we simply could not afford to hire enough people to do the same stuff on-prem or in colo.
I've never had to worry about messing with switches/cabling/UPSes with my Hetzner or OVH servers.
I'm not sure why people always believe that bare-metal == self-managed colo. That's not the case and colo only really makes sense for very large companies who can actually save by buying & managing their own hardware or have specific requirements that dedicated server providers don't offer.
Log growth can be addressed by using a managed service (including AWS Cloudwatch if you really wanted to, but you'd have to be a masochist). Frankly, you'd have the same issue if you were on EC2.
I would recommend starting with DigitalOcean. It may have little overhead but a much more cost effective and you can stay on it for longer before migrating.
> If you're an indie hacker, a boostrapper, a startup, an agency or a consultancy, or just a small team building a product, chances are you are not going to need the cost and complexity that comes with modern cloud platforms.
Hard disagree.
- On cost: there is almost nothing better for the indie hacker, bootstrapper, or startup than cloud services.
I run apps on all three platforms (Google, AWS, and Azure) and my monthly spend is less than $2.00 < month using a mix of free tier services and consumption based services (Google Cloud Run, Google Firestore, AWS CloudFront, AWS S3, Azure Functions, Azure CosmosDB).
- On complexity: if you've used Google Cloud Run or Azure Container Apps, you know how easy it is to run workloads in the cloud. Exceedingly easy. I can go from code on my machine to running API in the cloud that can scale from 0 - 1000 instances in under 5 minutes just by slapping in a Dockerfile _with no special architecture or consideration, no knowledge of platform specific CLIs, no knowledge of Terraform/Pulumi/etc._
The current generation of container-based serverless runtimes (Google Cloud Run, Azure Container Apps) is pretty much AMAZING for indie hackers; use whatever framework you want, use middleware, use whatever language you want. As long as you can copy/paste an app runtime specific Dockerfile (e.g. Node.js, dotnet, Go, Python, etc.) in there, you can run it in the cloud, and run it virtually for free until you actually get traffic.
If any of the projects take off, then pay to scale. If they don't take off, you've spent pennies. Some months I can't even believe they charge my CC for $0.02.
The AWS free tier lets you do a lot, and if you use it well, it lets you avoid up to about $50/month of digitalocean bills.
If you're never planning on scaling past a hobby project, the free tier is a great place to stay. If your hobby project "goes viral," though, it might cost you a few thousand dollars, but hopefully that helps you get a lot more money to turn your hobby into a business.
If you have commercial intent, however, $50/month goes from an expensive hobby (3 streaming services) to a very cheap business. At that point, the fact that you don't have to pay for scale on DO VMs and other platforms actually makes a lot more sense. You can sleep at night knowing that you will still have a business even under a load spike, and $50 of digital Ocean buys you roughly the compute power of $1000+ of AWS managed services.
The beauty of container based serverless is that you have portability. If your hobby project takes off and you want to run it on DO up to a certain ramp, you can still move your container workload into DO.
Google Cloud Run, Azure Container Apps, and AWS AppRunner (less so because it doesn't scale to zero) are really great tools for hobby devs and small shops.
> The beauty of container based serverless is that you have portability.
I think this is the major take away I am having from this hype cycle of the Cloud: that if we just build containers/functions, we can sort everything else out however the credit card allows.
I'm on the fence about if it should be DO or EKS or GKE or whatever. That's for Credit Card Man and me to decide. As an engineer, I just want to build a docker container and call it a day.
Yep, we were easily saving a developer salary per month vs AWS using colo’d hardware even as a very small company. And god help you if you’re trying to run something bandwidth intensive on AWS.
You use the spare that's next to it, failing that, in a desperate situation, go to any computer store, buy the biggest gaming PC you can find, load it up with RAM and use that. It'll work, your customers won't know nor care.
In practice, I wouldn't recommend colocation until you are at the scale and budget where you can maintain your own spares. Better outsource it to a dedicated host like OVH/Hetzner/etc that has tons of servers and can immediately replace the hardware.
Then there are a few other physical servers with load balanced redundant VMs, and it fails over seamlessly. HAProxy makes this pretty easy to handle. But we almost never had any hardware issues, servers are pretty reliable.
Actually I think that private cloud-like architecture (a bunch of physical servers with e.g. Dokku running on them) can be a good solution in some circumstances.
Yes, this isn't as much of a gotcha as you make it seem. On prem is literally running your own cloud. The architecture that all the big cloud providers use of racks and racks of servers running hypervisors deploying VMs using a shared storage tier and SDN is the same thing you build when you're on-prem. You are now just the implementor.
If you run an RDS instance, I can see hitting that.
But there are alternatives like using Supabase or any of the Postgres or MySQL aligned serverless DBs like Cockroach, Planetscale, or other services if you want relational semantics and still be serverless.
On AWS, the load balancer is also surprisingly pricey; easily eats up $20/mo.
The pattern that is free on AWS and costs a fair amount on digital ocean is one or two VPS-es (or equivalent serverless/kubernetes compute) plus a managed database instance. If you don't have the managed database, the AWS free tier is a lot less attractive.
Yeah I can’t agree with you at all. Without setting up your own NAT Gateway on EC2 on a t2.micro instance (or something cheap like that), it runs you about $30. This isn’t even accounting for database costs, usage costs, development costs, etc.
So right away that “I can’t believe they even charge my CC for $0.02” is real suspect. Do you have a completely empty AWS account?
> Without setting up your own NAT Gateway on EC2 on a t2.micro instance...
The problem is that you're using EC2 instead of AWS App Runner, Google Cloud Run, or Azure Container Apps.
> We haven’t even spoken about dev experience yet.
I'd strongly recommend that you give Google Cloud Run a try. You can go from empty codebase to running, on demand serverlesss runtime via GitHub with only a Dockerfile. I can build an app from scratch and have it running in Google Cloud in probably under 3 minutes with no special CLI knowledge or build.
Here's a sample Dockerfile I'd need to get a dotnet app into Google Cloud Run:
# The build environment
FROM mcr.microsoft.com/dotnet/sdk:6.0-alpine as build
WORKDIR /app
COPY . .
RUN dotnet restore
RUN dotnet publish -o /app/published-app --configuration Release
# The runtime
FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine as runtime
WORKDIR /app
COPY --from=build /app/published-app /app
# The value production is used in Program.cs to set the URL for Google Cloud Run
ENV ASPNETCORE_ENVIRONMENT=production
ENV IS_GOOGLE_CLOUD=true
ENTRYPOINT [ "dotnet", "/app/my-app.dll" ]
Every other aspect of the code remains unchanged. GCR will pull the code from GitHub, build the container, and operationalize it.
Authentication and authorization: I hand it off to Firebase identity management. But you can also just issue your own JWTs. You can run full applications in GCR like KeyCloak or IdentityServer. But the Firebase identity solution is really good.
Persistent data: Firestore (or Supabase, Planetscale, or CockroachDB if you want relational). On other platforms, I've used Azure CosmosDB which has a pay-as-you-go model which is practically free for hobby/POC use cases.
Secrets: depends on how secret it is; actual keys go into secrets manager in GCP which integrates with Cloud Run. Otherwise, you can configure it as an environment variable.
Agree, we bootstrapped a business from cents a month to $4 or $5 now, there's no maintenance, and I know if we get mentioned on Oprah - we'll be able to cope with a blip thousands of signups a second. I know how I'd run the system on our own hardware but can put off that decision to when (or if) we need it.
if you're spending less than 2$ per month how much traffic -> how much money can you make?
Sure, I also have plenty of static websites hosted for free by vercel / netlify / heroku / yourpick and even free functions.
As soon as you start hitting traffic, functions start to cost a lot vs your own vps.
My ideal setup right now is free static hosting from the marketing budget of friendly saas, free cloudflare on top and then APIs hosted on small vps (I have plenty of stuff on digitalocean but if I were to start from scratch I'd go fully with hetzner).
I avoid the big 3 as much as I can and I laugh for hours when I see the bills of clients using them.
This will get you pretty far for $2/mo. Within the free tier itself, assuming you can process each request in 250ms on a 1 vCPU container, you get 720,000 requests before you start paying for compute usage. Each $1.00 is another ~38,000 vCPU seconds (@1 GiB second) or ~152,000 requests @ 250ms per request.
Roughly speaking, $2/mo. is 1 million requests @ 250ms each request consuming 1 GiB seconds on a 1 vCPU container.
(There's some nominal cost for egress and storage of container images).
The thing that holds me back from Google cloud run Is that it is difficult to replicate that exact environment locally for testing and dev. If you’re working with stateless apps then that’s fine, But what is the typical local workflow of developing against a database, task Queue, etc.?
If you're going all-in on Google Cloud and using Firestore, then use the emulators [0]. The emulators includes Pub/Sub. For Cloud Task Queues, use an unofficial emulator [1]
If you're not going all-in on Google Cloud and say you want to use Postgres, then use a `docker-compose.yaml` file and pull in a Postgres container instance or run a local Postgres if you want. Then pick a free Postgres compatible cloud service for the actual runtime (e.g. Supabase free tier). Same goes for MySQL.
Thank you for the reply. That makes sense… I guess something about me just likes the purity of running the exact same container on my local machine as would be in prod, but yeah I agree that at some scale that doesn't work.
If you're using an IaC tool like Terraform or Pulumi, you can just setup/tear down test resources on demand (for integration/acceptance tests). Under normal usage, hopefully you can get away with mocks/stubs/fakes. Some development frameworks make this a lot easier than others
Using real resources is usually fine for smaller applications but can be very problematic as your application grows. With that in mind, it's good to create boundaries so you limit the amount of "real infrastructure" you need to test/deploy. Reference https://martinfowler.com/bliki/IntegrationTest.html
Thanks for taking the time to reply. Yeah ideally we always use real resources to mirror prod closely, but I see your point that that won’t always be possible as the app grows.
Another balance is to just use real resources in CI with some concurrency control to make sure a single build runs at once. Locally, you continue to use mocks/stubs/fakes
You still get assurance from testing but reduce the amount of places you run the "expensive" (time, resource, $) tests
> - Terraform to create the API gateway, database, lambdas, queues, Route 53 records: 1 week
- Terraform to create the IAM policies: 4 weeks
Perhaps it's because I am very familiar with the aforementioned tool and cloud but 5 weeks for writing those resources gives me the impresion of:
1. Lack of experience on AWS.
2. Lack of experience with Terraform.
3. Both.
I don't want to sound arrogant by any means but a Terraform project for something like that, documented, with its CI and applying changes via CD, would take me 4 days being generous.
I got handed a Terraform project for a GCP-based service. Simple dev, staging, prod environment. Secrets managed by secret manager, SQL Run without a public IP address for prod (but accessible via SSH for admins).
I more or less gave up after a month of beating my head on the brick wall. We hired an expert. Took him another month to get it all more or less sorted. There were still aspects that we wanted that we could not get Terraform/GCP to do.
In the end, we dropped Terraform and went back to modifying the GCP manually.
That's a generic and well documented stack that utilizes GCP defaults and works out of the box. An "expert" should not take a month to fail to set it up.
I've deployed similar, additionally including GKE, via terraform in a day - Checking TF code for an example 3-env GCP/GKE/CloudSQL stack it's less than 300 LoC
That said, it's not all good - my ongoing complaint with terraforming GCP is that the provider lags behind the features & config available in GCP console - worse than the AWS provider - especially w/r/t GKE and CloudSQL
Been a couple of years since I used AWS and I remember when CDK was just coming out. My big question at the time was whether or not the CDK would alert you to errors at compile time and save a bunch of whaling and gnashing of teeth that comes with Terraform.
Yeah it is much more mature now. Basically the CDK will generate YAML cloudformation templates at compile time so any errors are generally caught then. If you utilize typescript it is even safer since you know there are no missing parameters or anything like that.
I have not used terraform that much, but they did launch a CDK for terraform that does a similar thing https://www.terraform.io/cdktf. Basically you write in code and at compile time it converts it to the terraform templates.
Five weeks sounds about right based on my experience coming up to speed with Terraform. It's flexible enough to solve everybody's problems so it solves nobody's problems. Not until you inundate yourself enough with it to build the intermediary layer between what it does and what you want to do.
Same, I do it routinely and maybe the first time I ever did it, it took me a week but after that it was fast. But I may be being generous.
The only thing that could make that tough is if you put the Lambdas in a VPC. That can get tricky because you have to plan out subnets and whatnot but still not a week.
The AWS documentation is also extremely good with regards to what properties are on each resource. I can't speak for Terraform since I usually use CloudFormation / SAM directly. Maybe it's a Terraform problem?
> The only thing that could make that tough is if you put the Lambdas in a VPC. That can get tricky because you have to plan out subnets and whatnot but still not a week.
Yeah, it’s about 20 minutes if you use the VPC and Lambda modules from https://github.com/terraform-aws-modules. I could see a week if you had to learn all of this first with little prior experience but that’s true of everything. A newbie running a Linux colo server isn’t going to get all of the security & reliability issues right in less time, either.
I know those tools too. It’s kind of my job to know them seeing I work at AWS in ProServe.
But if someone gave me the same use case as the author. I wouldn’t suggest any of those tools. What’s the business case for introducing the complexity of AWS for someone who is just trying to get an MVP out the door who doesn’t know cloud?
I’ve been in the industry for 25+ years and only first logged into the AWS console in mid 2018. I had a job at AWS two years later. That gives me a completely different perspective
It's a joke. Or at least I've interpreted it as such. Still true that you always spend more time terraforming the little things compared to what you expected.
Four days sounds fair if you're experienced. If you're new to TF/AWS I could easily see it taking significantly longer. If you assume IAM is the devil and refuse to learn it, it will absolutely take a while to get correct
Agreed here. There is no reason setting up IAM policies through Terraform takes four weeks. Anecdotally, on my own personal projects it took me maybe three hours, or more, to set up IAM policies for AWS Lambda, ECS and RDS.
ITT: people who spent many hours learning proprietary (often unnecessarily complex) cloud platforms trying to convince others (and themselves) that it was the best use of their limited time alive.
Stockholm syndrome à la Big Cloud.
It's okay to be interested in elaborate cloud architecture things and learn them because of that, but don't sell it as one-size-fits-all thing that every little company needs.
Most companies don't need that complexity, but of course, Big Cloud with their billions needs to convince you otherwise.
Exactly. Of course GCP/Azure/AWS have great development kits, of course they make it easy to get a Docker application running for the first time within 1 minute. That is the sales model.
However, to be cost effective, you need to adapt your application to be more cloud native using their propietary SDKs. Azure Functions/Lambdna, CosmosDB, Blob Storage/S3, etc. The application gets cheaper, but you've now also bought yourself into the ecosystem and you're never migrating anywhere else.
And now the pricing increases. Or the cloud provider decides you shouldn't be a client anymore. Too bad. No easy way back.
There is still not much wrong with a webapp on a VM. You still need sysops, except classic sysops instead of cloud certified sysops.
At small scale, you can lower your complexity using cloud. You don't need k8s for a small operation, just spin a couple of VMs and set them up via a few lines of Ansible.
OTOH you can pick a managed datsbase: you just get a connection string to a Postgres with failover and backup already taken care of. Same with queue services, email services, etc. They have really simple APIs.
You only need platform-specific knowledge when you start operating at a larger scale. By that time, you likely can afford to hire a dedicated SRE.
> You don't need k8s for a small operation, just spin a couple of VMs and set them up via a few lines of Ansible.
You can replace "couple VMs" with a dedicated Hetzner/OVH/Kimsufi server, it'll be the same except you won't get ripped off on egress bandwidth and performance.
I agree that AWS egress bandwidth is a rip-off, but cloud != AWS only. Many cloud providers, from DO to Vultr, offer sane egress prices.
The cheapest dedicated server at Hetzner (an excellent provier indeed) is €44.39 / mo, while their cheapest VM option is €4.51 / mo, literally an order of magnitude less. If you don't need the power of a dedicated server for your small project (and even 10 small projects), you don't need to buy it.
You can't have a free dedicated server, but many small projects can run either entirely within free tiers of some cloud providers, or for pennies a month based on usage.
OTOH you of course can run such small projects from your desktop at home, or maybe even from your NAS or router if they are beefy enough, entirely under your control, and for free!
The key value prop of the cloud for me is elasticity. Say, our project spins up more nodes in anticipation of daily waves of traffic, and then spins them down to save cost when the load goes way down. This won't work so well with long-term dedicated servers.
OVH has their Kimsufi range which gives you low-power dedicated with unmetered bandwidth and the basic ones go under 10 bucks a month. It's 100Mbps, but unlimited traffic, and a real low-power CPU is most likely still better than cloud "vCPUs".
I don't know. Maybe I'm in a bubble, but it seems to me that knowing the basics of AWS (or some cloud provider) has become part of the standard developer's toolkit. With AWS specifically, there's so much documentation out there about getting started that I think you can have something up in a day or two on something like ECS or lambda (using something like the Serverless framework). And then when you need the more complex functionality, you are already in the AWS ecosystem.
If you are a startup trying to get a product to market, AWS is typically going to be a very small cost unless you are doing something very compute intensive (in which case something like Heroku, which the author recommends, certainly won't be cheaper anyway). The high bills only come later, if ever, after you've decided to create 20 databases and 50 apps for your 70 person startup.
With Fargate or Google Cloud Run, it really isn't. Assuming zero knowledge it's probably easier to learn how to build a docker container and call a binary to send it to the service than it is to setup ssh and rsync and a server to host your website.
That takes a Dockerfile, manages networking, secrets and CI/CD deployment. I have a few quibbles with what it does, but it generally works and is being maintained/updated.
Though there's lots of different ways to use AWS, so the experience your team brings may be a sort of complicated venn diagram. Even within a simple product, like deploying serverless, there's SAM vs serverless framework, vs scripted AWS cli. Using stacks or not. Using another layer on top or not like Terraform or CDK, and so on. Then the actual pattern of using it, Lambda layers, heavy or light patterns for securing things, using versions/aliases or not, and so on.
It wouldn't be unusual for a tech lead to pick some approach that ends up being new for the rest of the team. So some ecosystem with fewer choices would probably be faster.
Required? No, I'm not saying that. But yes, it's become the industry standard. If you don't know some AWS basics and you are a generalist web developer, you'd probably do well to learn them in order to make yourself a more marketable engineer.
There are plenty of good alternatives, but AWS is the 800-pound gorilla. You have to know at least a little bit about it in order to know why not to use it.
It's like saying you don't want to use React/Angular/Vue for your web app. There are good reasons not to, but at this point you should at least have some experience with web frameworks before making a technical decision not to use them. If your answer is "I don't know them and I don't want to learn them", that's fine for a personal project, but probably not a reason not to use them at your full-time startup. If your reason is "I know React, but for my specific use case, vanilla HTML/CSS/JS is better" then you are making a more informed decision.
I suppose my problem is that the enormous complexity paired with the utility billing feels like I'm trying to drink from a pool of water surrounded by enormous lurking predators (the 800-pound gorilla analogy seems apt). I think everyone has at least a story of a forgotten instance that billed them a bit more, but with "infinitely scalable" compute come infinitely scalable bills. There are instances of developers creating infinite loops in cloud functions that result in 6 figure AWS bills. Of course then the advice is to plead your case and hope for a credit, but I wouldn't expect everlasting benevolence from Bezos's machine.
I have other issues and could probably expound on them at length, but work to do and all that. I don't disagree with you that it's an important tool for engineers today (I've certainly got an account or two), but that doesn't mean I have to like it.
The basics give you enough to be dangerous, but cloud stuff has become complex enough to require dedicated people to do it well.
I'd prefer to hire someone dedicated to that and just let them work part time when the environment is simple over a developer with just the basics who's going to try to architect and run everything.
> I'd prefer to hire someone dedicated to that and just let them work part time when the environment is simple over a developer with just the basics who's going to try to architect and run everything.
And me thinking we got to the cloud to get rid of the BOFH.
This is just another way of saying “you shouldn’t use AWS if you don’t know how to use it”
Yes, there’s a steep learning curve. But once you’re passed that (or if you gained that knowledge in a prior role) AWS can easily hands down be the easiest, cheapest, and fastest infrastructure platform to use.
…if you know what you’re doing.
If you don’t know the ins and outs of AWS, then yes, you probably shouldn’t use it for your next MVP or startup idea.
Different strokes for different folks. (Or at least, use cases.)
We’ve found at work that if you already have the talent, the hyper scale cloud platforms are amongst the most expensive ways to manage infrastructure if you go all in.
For example $0.40/secret/mo is _expensive_ compared to the cost of an HA vault (not necessarily Hashicorp) setup. If you have 1,000 secrets but you only need to access any given secret once a day, that’s a lot of expense against just setting up your own. And then you can take it with you.
Beyond that, we’ve had a LOT more reliable performance from our current VPS provider than we ever got from EC2.
That’s not to say AWS is exactly without competition. We use S3 extensively because nothing compares for our usage.
If you don’t need secret manager features like region replication and rotation you can use system manager parameters and the secret type. It’s effectively free. We use secret manager but weren’t aware of the price difference.
Lol. This may be true but if kind of pointless as an api on localist isn’t very useful unless you’re automating your home. Of course it’s easier to hack something out on localhost than to design for actual users.
I think it makes more sense to build incrementally with the end in mind. So writing those terraform scripts will take less time if you initially write them to deploy to localhost for testing.
API gateway is simply how you expose Lambdas to an HTTP interface in AWS. It was the easiest way until they recently unveiled a way to expose the Lambda directly. You can also use ALBs (Application Load Balancers) and CloudFront to expose Lambdas to HTTP.
Either way, Lambdas are hard to debug locally, often I just deploy them to test (since deploying is easy). Or I write my code such that it bootstraps differently when launched locally vs Lambda. Either way, unless it is a very complex app that has lots of external dependencies, 4 days is a bit much.
But don’t try to create a REST API Gateway with more than 200 resources, or CloudFormation will randomly start failing.
Or try to add more than 100 rules to your ALB, because it’ll be impossible.
My biggest issue with AWS is that the limits are so arbitrary, and seem to solely exist due to terrible design decisions.
If my local express server, or nginx can deal with 100 endpoints, how is it possible for this multi billion dollar infinitely scalable service to not do the same…
Interesting, I haven't had the experience with CloudFormation.
At that scale, however, I tend to group my Lambdas as microservices not per endpoint. It helps with cold start time as well. So for example, if I have a
page" resource, I don't make that 5+ endpoints, I make it a wildcard / prefix match and do the routing inside of the service lambda.
Maybe you legitimately have 100+ different microservices, in which case, I don't doubt that is a problem, I just haven't experienced it.
The 200 resources thing has been a really frustrating problem for us too. We've started migrating ours to separate API Gateways at a "service" level and then mapping paths to different APIs using the Custom Domain API Mappings.
> We've started migrating ours to separate API Gateways at a "service" level and then mapping paths to different APIs using the Custom Domain API Mappings.
Yeah, that was suggested to us too, but it felt like a dirty hack to me. What is the point of having an API gateway if you can’t have the single one (our microservices hook themselves up to that single gateway).
Our solution involves a custom CDK resource that keeps re-creating API gateways until it gets a root resource ID starting in the lower range of the alphabet, ensuring it’ll always be found by CloudFormation.
>This may be true but if kind of pointless as an api on localist isn’t very useful unless you’re automating your home. Of course it’s easier to hack something out on localhost than to design for actual users.
When did developing software on your own machine stop meaning "design for actual users"?
You should have a strong and reliable deployment for production, yes. But not being able run a baby instance locally just as easily means sacrificing your development loop.
Designing for me as a single user is different from designing for other users. Other users can’t hit my localhost.
The article talked about how much time it takes to get working so it seems like the author took shortcuts to get it working locally.
I agree that it’s a good practice to dev so deploying locally works as well as deploying remotely (or to lots of environments). But this is different than developing only for localhost.
We had a 'microservice' running on a server that had a bunch of other random things on it, it ran for 4-5 years until the server got decommissioned.
Nobody knew what all ran on that server, worse yet nobody knew that particular service ran on it. The person who wrote it was long gone.
It took a day to troubleshoot, a day to figure out what actually happened, and 5 days to get the server backup and running.
A couple months later, someone shut the server down again. It only took three days to fix it the second time.
In order to ensure this would never happen again, there were about 15 meetings, 20 people were involved, and then service was re-written and hosted on Azure (with the rest of some of our stuff). It's probably failed about 100 times since then, in about a hundred different ways.
I've been able to get a lot done with API Gateway, Lambda, S3, RDS, SQS, Lex, and ElasticSearch. I work for a Fortune 200 company who's risk averse and views "the cloud" with suspicion. My team's ability to get so much done is starting to change that perception.
Sure, if you're in a startup and you're doing most of the infrastructure and operational work yourself then working on-premise is often advantageous. If, like me, you're working for a Fortune 200 company and it takes multiple ServiceNow tickets to get on-prem hardware, a lead time of several months to get it through procurement and subsequently racked and stacked, and working with infrastructure solution engineers throughout the process - trust me, AWS is a much better choice and will enable your team to get stuff done.
If you are working for a startup then beware, as you grow avoid the temptation to build a data center - go to the public cloud. I would argue since that's where you're going to be hosted anyway - assuming your successful growth - then you should really consider just starting out there in the first place.
> If, like me, you're working for a Fortune 200 company and it takes multiple ServiceNow tickets to get on-prem hardware
What's stopping them, after they "embrace the Cloud," from making it take multiple ServiceNow tickets and several months to change an IAM policy? This has been my experience in very large corps that do use AWS. Typically it's also made a violation of policy to use a team-specific cloud account.
P.S. After having helped a mid-sized company migrate some core functions from DC to cloud, I agree with your startup advice.
You are correct, nothing stops us from taking our terrible on-prem practice and applying them to the cloud except for one thing - it will be more obvious that we screwed the pooch because they let some renegades in before they were able to nail everything down. Now they can't hide behind their gobbledygook BS when they try to apply their existing practices to the cloud. My team is respected so much that we're able to push back on their nonsense in public meetings with the suits. Simply put, I'm enjoying First Mover advantage. Also, doesn't hurt that before joining this team I was on the Enterprise Architecture team and I literally wrote our Cloud Policy! I think that was well-played, even if I say so myself! :)
I use AWS all the time and for startups I 100% agree with this. Sure, I could get a cool stack up and running in AWS much faster than this article, but the infrastructure by its self delivers ZERO value even if it’s shiny and fun to work on. We must remember this.
Start on Heroku, maybe with your own RDS. This removes so many decisions and ongoing overhead and lets you focus on building the thing that actually delivers value.
Two points (worked at AWS for 5 years and then left to start a company, which runs on AWS)
- Yes, if you don’t have AWS, Azure or GCP experience it can be hard. Harder than it should be. But this is why I try to make things simple. Run node / express in lambda. Use managed services. Use CDK so the IaC abstraction is easier. Definitely not 4 weeks for the IAM policies.
- You get tons of credits as a VC backed startup (in all providers) so cost is not that much of an initial issue
- Yes you need to pay attention on the expenses, setup budgets and budget alarms, and run cost optimizers often
The author's not wrong. Cost comes with lack of accountability in my experience. In turn, my devsecops dept (~20 people) has kept costs down by holding monthly AWS accountability meetings. "Who owns this and why does it exist?" is the leading question.
> The all-you-can-eat buffet problem
Valid point. But I've gotten far in my career by specializing in AWS. It's not going anywhere soon. It's the one cloud provider I would say you should go all-in on. Azure maybe next. GCP? Come on. Conversely, I just got an email from Heroku saying they're retiring one of my free-tier databases that I still use.
> Culture of simplicity eats strategy of complexity for breakfast
Orgs, please retire this saying. I hear this everywhere. It's lost a lot of meaning. Just spell out what your org does better than the rest of the pack.
The G. Products from Google are often transient, support is often non-existent. It doesn't create confidence for people who might be investing big into proprietary cloud features.
Lots of people have had their servers randomly deleted one day because a google script thought they were a bot. And then of course, there is no one to call at google to have it resolved.
Don't use AWS, GCP or Azure. Use Digital Ocean and the likes. Cheap VM compute, managed databases. And for the most effective setup, just get an affordable VM and chuck https://dokku.com/ onto it. Boom you have your self hosted Heroku. Cheap bandwidth, opportunity to scale.
AWS is a fools errand at the startup level unless you need some of their specialised services. Stick to "tier 2" cloud providers like Digital Ocean or Linode. If all you need is servers, database and storage, then don't waste your money on the major cloud providers. They are the wrong choice for basic compute.
In Netflix more than 10 years ago, it's more like this: a single engineer builds a deployment/management tool: 1 - 2 months. Every other engineer creates a new and fully configured cluster: minutes.
Seriously, can we please get over the fetish of using anything this DSL that YAML or whatever "specification language"? Such tools are powerful, flexible, but should not have a place for engineers who just want to provision resources. The tools violate almost every UX principles, in particular the following:
- Discoverability. Very little. One has to read tons of docs and SO posts to figure out what needs to be done. You want to pass in some environment variables? A typical answer from those who use Nomad/TFE: easy, just pass in this 200 lines of Jinja template. Really? Really? You call this ease of freaking use?
- Affordance. None.
- Constraints. If you call the errors only after you submit your 1000-line yaml scripts.
- Consistency. Maybe, but still, embedding a Jinja template to pass in variable is an insult to UX.
It's a unrepentable sin to ask me to learn your shit.
As an solopreneur running a SaaS and various apps/addons on other SaaSes on AWS for 7 years now, I'm inclined to agree.
In fact, I've already migrated a fair chunk of my workload off AWS Lambda onto constantly running fly.io VMs.
It's significantly cheaper than serverless (when you're past the free tier), the servers just restart if they crash (as opposed to running up a six figure AWS bill), and it's less complicated operationally (it's just a VM, less need to pipe messages with SQS, figure out IAM, etc)
I work for AWS and I get paid decent money to work with companies and help them become “cloud native”. My specialty is “application modernization” and I avoid “lift and shift” projects like the plague. Even though I realize that doing lift and shifts first is the right answer sometimes. I think I can confidently state that I know AWS pretty well.
That being said, in my personal life if someone ever came to me and said that they were starting a project from scratch or even if I were starting a hobby project from scratch where I saw Lambda + DynamoDB wasn’t the right answer, I would just use Lightsail and simple monolithic application using whatever frameworks are appropriate that I already knew.
AWS Lightsail is a simple fixed priced VPS. I’m not advocating using Lightsail over another VPS provider. It would just be my preference because I know how to transition to full fledged AWS later.
Hey, coincidentally I am about to start a role in "Application Modernization" as Cloud Solution Architect at MSFT (though I only have experience with AWS).
Do you mind if I ask u some further questions via the email in your profile?
I’m not the sales guy. I’m not here to do sales talking points.
I’m going to speak from an on the ground hands on keyboard implementation person.
AWS offers plenty of hosted versions of open source solutions and API compatible services like DocumentDB with Mongo compatibility.
If I’m working with a customer that prefers the open source solution and there is an equivalent on AWS, I’m going to suggest that. My goal is never to introduce too much new technology to an organization unless there is a compelling need.
I’ve recommended everything from a straight lift and shift, to hybrid, to full on all in on AWS depending on the use case. I’m not dogmatic and I’ve never been told “get the customer all in on our services so we can lock them in”. I’ve implemented pure AWS CI/CD solutions, integrated with Azure DevOps, done lift and shifts with Jenkins, etc.
I’m judged completely by outcomes and whether the customer is satisfied.
But I’ve been railing against worrying about “lock-in” wat before coming to AWS. I’ve been part of numerous large scale migrations and implementations. If you’re at any scale, you’re always both technically and organizationally “locked in” to your infrastructure choices and migrating involves, dealing with CxOs, PMO, retraining, security, regressions, etc. It’s usually much easier to just have a conversation with your account manager.
>Let's face it, choosing AWS is the cloud computing version of "nobody ever got fired for buying IBM". There is perceived safety in choosing a popular offering — it's what everybody else does.
This is the only line that actually matters. Are there better and cheaper options for your organization? Almost certainly, but no one ever got fired for picking AWS.
Article doesn’t really present an argument. Pretty much starts with ‘you don’t want the cost and complexity’ and then goes on from that point as a given.
A more useful article would actually walk through the cost/complexity trade
One thing I have been wrong about for about 2 years is predicting companies would pull back from using the cloud. What I see in corporate AWS accounts is a lot of waste. I thought once companies started tightening spending they would look at their cloud spending and shut a bunch of it down. When an EC2 instance only costs $30 a month to leave running, it's easy to forget about it. You get enough of those across a corporate enterprise and you are talking about serious money.
Good point, I explained that poorly (sorry, was half paying attention to a conference call). I meant "pull back" as in reduce their number of instances, which would reduce their monthly spend.
I am currently in the middle of setting up AWS and that decision graph made me chuckle because it resonates quite deeply.
I need to do GPU inference but I don't want to run the machine 24x7. I may use it for about 4 hours per day at best. Lambda doesn't offer GPUs and neither does ECS+Fargate.
It seems like I could setup an endpoint using Sagemaker and then destroy it when no longer needed, and automate all of this but it feels quite messy.
The other route is perhaps I can launch an instance every day with ECS and then get rid of it.
All these routes seem quite inefficient. There seems to be something called Elastic inference where I can provision the right amount of GPU resources - but it seems like I'll need a spare EC2 instance to do that if I'm not mistaken, which is not ideal either.
I guess all this stems from the fact that there is no straightforward virtualization for GPU workloads and so they have to provision them 1:1 which currently they are not equipped to do.
Has anyone run into a similar problem and found a more elegant solution? All of the above are very messy. Is there some obvious choice I am missing?
Depends on what "4 hours per day" really means. If you want an interactive endpoint, putting in the work to set up an ECS task you can start and stop feels like the best approach. If you have longer-running inference tasks and just want to pick up results asynchronously, Batch (which is a layer on top of ECS) seems like the way to go.
SageMaker might have an abstraction which is a closer fit for your particular use-case, but I'd be wary of potential cost excesses; running on raw EC2 and automating the lifetime somehow is inevitably going to be the cheapest route.
The task takes about 1-4 minutes. In an ideal world, it can start immediately and then shutdown, but more realistically it will have to spin up or batch things up at a convenient point in time. Sagemaker seems more expensive that doing it via ECS.
If your GPU inference can run on an Intel integrated GPU, you could rent a dedicated server from OVH for ~$130 a month and use the integrated GPU on that. I don't know about affordable dedicated servers from mature providers with Nvidia GPUs.
> Unless you hire a good (read: expensive) devops person to look after your infrastructure, clickops-ing your way through the AWS console will likely leave behind a pile of unused instances and components that will eat into your budget.
While this is true-ish (finding lost resources or just not creating them in the first place is not that complex), when the day comes that you will need/want to move to one of the cloud providers you will need a good devops to handle that hybrid cloud environment and making the transfer as painless as possible.
cross cloud routing, DB migrations and not to mention setting up secure access for all of it is less complex in my POV then cost managing your cloud account
I think AWS is shooting themselves in the foot with their ridiculous EKS offering. The fact that it is necessary to first download a 3rd party tool (eksctl) to get anything working is insane. Then, getting basic things working like connecting a load balancer involves importing all kinds of random policies and configuration (which does who knows what) from random github repositories is perhaps even more ridiculous. The same thing in GCP takes nearly zero effort.
I can understand why AWS would try to steer people away from Kubernetes since it has a commoditizing effect. However, it could end up steering people away from AWS entirely.
A whole lot of nothing being said. Use the right tool for the right job.
If you're familiar with AWS, use it and get running and focus on delivering product instead of fine-tuning all the settings and worrying about the perfect cloud environment.
My honest opinion, using some third party tools/services on top of AWS only creates tech debt. Over the last two years, AWS has improved a lot of their tools to be incredibly easy that any developer can use it. Developers should be able to understand how their applications run in the cloud provider as well.
There’s been an seo template since slashdot ages of find a prevailing wisdom, write a naive clickbaity post, profit. Sometimes I think authors are earnestly wrong, sometimes I think they are just fake. Sadly, it’s still profitable.
I thought that one day I’d interview someone who wrote a post like that and figure out for real. But I’ve been waiting decades so it probably won’t happen.
Step 1: Write an incendiary title
Step 2: Make definitive statements meant to be applied broadly but actually targeted at a specific situation that the author is experiencing, or comes from the author's own problems with something.
Step 3: Make sure to cast doubt and insecurity, making somebody else feel like they made the wrong choice, even when everything is working perfectly
Step 4: Street cred improved!
If you hire people who don't know how the cloud works, then of course their time will be sucked up by learning how it works. If you hire people who know how the cloud works, it is a productivity multiplier. Use the tools that you know.
BUT. If you have to build a giant wooden sailing ship, and all you know how to use is a Swiss army knife... and you want to get that ship done this century..... you need to learn new tools.
I generally agree that no one that isnt a professional devops person (or has spent months tuning IAM and route53 policies) should be using AWS. That said, GCP is so much easier to use and easily usable by a novice because it has a lot of good defaults.
Compute and egress costs can be prohibitive at scale, but features like storage + bigquery (OLAP SQL db where u only pay for queries) are basically free for low-to-moderate volume workloads.
Disclaimer: not a backend or web engineer, I mostly write embedded software, but inevitably need to implement services from time to time (and had a startup at one point)
My response is "yes but what about databases". There are any number of ways of hosting my application - "put it on a VM" is a perfectly reasonable approach, particularly as my preferred platform is Elixir which is pretty monolithic anyway. That's fine for the app, but what about the DB? I think I know just enough to know that hosting Postgres properly (i.e. reliably and performantly with appropriate backups) is not that easy, and I'd like someone else to do it for me.
For my startup I used App Engine (Flexible Environment) and Cloud SQL. That worked well in that I had the "two instances behind a load balancer" that you want for seamless upgrade, and managed SQL without having to delve into all the many Google Cloud services for networking etc. etc. Everything else was just 'more elixir' which is easy to test locally.
This might be true if you’ve never used AWS before. But if you know AWS it doesn’t take anywhere near that long.
I’m building a PBBG and have it on AWS. Built the site in .NET with SignalR, MartenDB/PostgreSQL. Needed to host it.
2 evenings to write a cloudformation script which builds a VPC. Public and private subnets. RDS. A tiny instance to act as a nat gateway (for servers in private subnet). A small server with HAProxy for load balancer. A tiny server for redis. 2 web servers. Plus some sh scripts to build the project. Zip. Ssh jump to the server and deploy. Total cost is like $42/m.
Used a tiny instance for nat gateway cos aws nat gateway costs $32+ingress. Used tiny instance for redis as it’s only used for signalr connections across servers and services. Tiny instance for HAProxy cos aws application load balancer is $15/m. I could consolidate 3 of the servers into 1 but I choose not to.
You can build flexible things on AWS. The problem with AWS is it’s very easy to just spin up random stuff and not care about cost and blow out a budget quickly.
> Used a tiny instance for nat gateway cos aws nat gateway costs $32+ingress
AWS NAT gateway is certainly pricey. And doesn't even support NAT traversal which means P2P/STUN stuff is out (this is in contrast to GCP which does support it).
If you are a starting out, you don't have to use terraform, use the console. Nothing stops you from using the managed services in a heroku box either. Need a queue for your API Heroku->SQS. Need storage, Heroku->S3. The industry always talks about best practices but rarely mentions maturity models which this article gets close to brining up but stops short.
So what, the suggestion is to go with Heroku instead?
I think fargate + docker is super easy to setup, run and maintain. Maybe Heroku makes it a little bit easier, but that's about it. Once you leave the Heroku ecosystem you'll have lost all the time you saved.
I spent a few evenings going the ECS route and the complexity and cost of just a single api available to the internet and a database kept ballooning to way more than what I wanted to pay for and deal with.
Fargate, image repo, NAT gateway, I don't even remember all the nonsense at this point but it was ridiculous.
I took a look at a few alternatives and DigitalOcean's App service was night and day easier, faster, and cheaper to deal with.
TLDR - Author seems to value simplicity & focusing on value for users. Doesn't like AWS/Google/etc. because they have to many advanced features. Suggests using Heroku instead.
The first part I think most anyone would agree with.
The 2nd part doesn't make sense. AWS/Google etc all have simple ways to setup a web app & database without messing with containers, microservices, event architecture etc.
As for cost, they all offer generous free tiers for learning & hobby projects.
If you want to make a business out of it, check out their also very generous programs for startups with over $100,000 of free services. Azure is over $150k.
Digital Ocean, Cloudflare and many others also offer great incentives to get you to build on their platform for cheap or free in hopes that you succeed & stay with them.
Always remember, HN itself runs on a single dedicated server from a fairly small provider (m5hosting.com). IIUC, they have a backup server. But still, if it's good enough for HN, it's good enough for lots of us.
- Time it took to learn the skills to write a scalable service that can handle 100K events per second: 3 years
- Time it took to write that API on localhost: 4 days
- Time it took to learn Terraform and some AWS services, to create the API gateway, database, lambdas, queues, Route 53 records: 1 week
- Time it took to learn AWS IAM policies to create the IAM policies in Terraform: 4 weeks
Author conveniently left out a few bits of information. Once you learn IAM and Terraform, I'm doubtful it takes you another 4 weeks to setup the policies for a new project.
In my experience Terraform was a horrible pain point, and yet I'd happily suffer it again. We can spin up and spin down transient UAT environments which are pristine and match PROD exactly in a matter of seconds and we can robustly test infrastructure changes. The improvement in developer productivity that entailed was worth the cost.
There is a middle way, just set your boundaries on which AWS services you want to use. 1 x EC2 server + 1 x application load balancer and a few S3 buckets works well for my hobby projects (ALB is just for multiple domain SSL and reverse proxy). You can run docker on the EC2 and deploy containers. Yes there is a small amount of setting things up in the AWS UI and some bash scripting for deployment, but then you'd have to configure your DNS and firewall somewhere with a VPS anyway.
Why do you need IAM policies at all to run a service that works on localhost on AWS? I would expect that you need IAM policies if your app integrates with the AWS platform, which it definitely wasn't doing on localhost, or if you are running multiple services in the same account and want to lock them down (e.g. not have services access each other's S3 buckets and such).
AWS is very good and cheap once you know how to use it and what parts. The alternatives most likely run on top of AWS anyhow. I use terranix https://terranix.org/ instead of straight terraform. I can compile, package, deploy everything just with nix.
These kind of blog posts treat this as a technical problem instead of a business problem.
Until you get some unicorn that says "we are profitable at price $X and all of our competitors are losing money at $X + $Y, and it's because of our software architecture and infrastructure choices", nobody is going to be convinced.
I can see how this makes sense for a startup etc that has passed some threshold of operational complexity. As an “indie hacker” there is no way I’d be able to move out of the cloud without my costs going up by an order of magnitude.
I actually totally disagree. With the various development frameworks/CLIs, AWS has the ecosystem benefits that can make hosting on it a breeze, and leave more time to focus on delivering value to the customer.
I really like Vercel's approach of mixing cloud providers behind the scenes. They deliver features (like edge functions) using one cloud, basic hosting using another one, etc. It's all invisible to you as the customer and painlessly abstracted.
Less than $5/month. Yes, on AWS. Serverless (the genuine kind, which scales to zero with pay-per-request) is pretty much free until you have actual users, and once you have actual users, you have actual revenue to pay your cloud bills. Unless your ($revenue / $hosting_costs) is less than 1.0, in which case, you don't have a business.
> "(LISP) programmers know the value of everything and the cost of nothing". A specific technology product never exists in a vaccuum — it has to communicate and co-exist with other components in the system. There are costs associated with every choice, often hidden costs.
An odd choice of quote, considering the author is promoting choices that costs orders of magnitude more money in the earliest stages, and inevitably provoke high migration costs when it comes time to move off those platforms.
> Cultivate a culture of ruthlessly fighting complexity
Again, an odd claim. Stacks like AWS Lambda and DynamoDB let me forget about scaling concerns* (asterisk because this is true in the early stages, slightly less true later, but still mostly true compared to traditional architecture). Those concerns absolutely rear their head when handing off to a site like Render that refuses to publish public pricing for their largest database instances, or talk about very common usecases like read replicas for analytics workloads.
> the harsh truth is that neither Lambda Functions, nor Kubernetes, nor Kafka on their own will magically make your app work correctly, be performant and deliver value.
But Redis, PostgreSQL, and PaaS-style service deployment magically will? You mean, early startup CTOs need to actually think about the architecture they propose to build to satisfy business needs? gasp
> "Why do we think this choice will provide the most value for users compared to the alternatives?"
Because serverless means not needing to hire DevOps. Because most companies running Kubernetes do not get anywhere near the ~38+% efficiency (last time I ran the numbers, and that's for production environments, not even including staging/testing/development environments) they need to make Kubernetes more cost-efficient than AWS Lambda, because developers just don't have time to figure out why the hell their services need a guaranteed vCPU, won't perform with less, and in the meantime their services are using less than 20% of the resources they requested - and they particularly don't have time to figure it out when Customer Support is happy, Product is happy, and Finance will cough up whatever budget is needed so long as Engineering says that it's "necessary". Because founders who actually think about optimizing for value, will optimize for what is scarce, and what is actually scarce is not money (plenty of money out there looking for the right investment opportunities that check all the right boxes), it is people. Serverless means hiring fewer people because you hand off undifferentiated heavy lifting.
This seems very reasonable to me. I thought it was going to be a pitch for on prem, which is also fine for certain scales.
I think generally the scaling steps from startup to megacorp go:
Heroku/Dokku > Public Cloud >Dedicated servers in someone else's DC > Custom Hardware in custom built data centers.
Each makes sense at each scale. I find it to be more of a right tool for the job consideration than one being better than the other.
With modern cloud tooling your infra can also look more or less logically the same once you grow past the heroku level.