Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you think the user tolerance for paying to host large images would be? Obviously bandwidth and storage cost money, and if a site is to allow 4x larger photos (driving 4x larger costs), the site needs to make 4x more money somehow.

Page views and ads and referral arrangements to photo printers aren't going to automatically scale with photo sizes for a free site.



> if a site is to allow 4x larger photos (driving 4x larger costs), the site needs to make 4x more money somehow

Not so fast. Bandwidth and storage needs scale roughly with resolution, but other costs scale more slowly or even not at all. Note that even storage (and to a lesser extend bandwidth) has a per-action cost that does not depend (much) on resolution. And, storage costs include meta data, which doesn't scale with resolution.

I've been working through a cost model for a (different kind of) image service and there are are surprises.

For a first approximation, work through how you might build such a site on Google App Engine and/or Amazon Web Services and build a parameterized cost model using their fee schedules for different things. Fiddle with the values for the parameters.


I realize that 4x increase in pixels doesn't mean 4x increase in the file size, but I was discussing in terms of filesize. Unless you have absolutely insanely higher metadata stored per image than we do, your metadata probably amounts to under 2.5% of the image data for hypothesized 2MB files, so maybe 2.05 goes to 10.05 which is basically still a 400% increase. Part of the metadata is on fast (DB) disk, so your costs don't scale absolutely linearly, but bulk cold disk is still scaling up 4x.

To a large extent, I have worked through it for our application (I run IT for a top 100 e-commerce site that does a very substantial amount of uploads in the holiday season; we choose to self-host several dozen TB and have an emergency overflow possibility out to S3 if we fill up our in-house storage).

I can't see how storage costs are meaningfully sub-linear, and bandwidth costs can be due to bulk pricing, but are still first-approximation linear with upload size. (You might argue that you can use 95/5 pricing to work around that by forcing users to schedule their uploads for an off-peak time, but then you could do that in the base case as well.)

I would love to hear more about your surprises in the model, either on HN or privately, as this represents a substantial portion of my budget, and if I'm missing something, I'm not too proud to change course. :)


[I can't find an e-mail address in your profile or the blog that it mentions.]

I was going with 4x file size and assumed that bandwidth for images is linear in file size, aka no bulk discounts.

One of my points is that IOs have a cost too, a cost that is largely independent of file size.

To first approximation, disk IO capacity is proportional to the number of disks. (Yes, some disks, especially the flash ones, support a lot more IOs/sec than others.) If you're IOs bound, you either have spare disk space or can probably increase the amount of disk space at a sublinear price. (1.5TB drives are <3x as expensive as 500GB.)

For some data transfers, AWS has a "per operation" charge in addition to a bandwidth charge. The latter is proportional to file size but the former is not.

My application has a lot of processing costs. Most of them are on metadata, not the images, so they don't grow with image size. I also do a lot of stuff with "thumbed" images - producing them is a function of file size but storing them and moving them around isn't.

My model is different from yours in at least two ways.

(1) I'm estimating some things.

(2) I'm using AWS and GAE prices. (I'm assuming the highest prices because if my app is getting enough use that I'm getting bulk discounts, I've got other problems.)

If I had a model that tracked my actual experience, I wouldn't listen to some bozo on a website....

FWIW, data in a db is significantly larger than the actual data.


OK Thanks. Couple comments:

Our bulk storage disk sees very little in the way of IOPS-requested. (An entire Gbps pipe couldn't fill the IO capacity of 4U of the 1TB SATA drives we use for bulk upload storage, and we have way more than that. ;) ) DB and hosts disks are another story entirely, and I have to admit that we don't really account for all those costs as "upload related" (by and large, they are not upload-driven) so we have some model inaccuracy there as well, but for every 2MB file we have on SATA, we probably have 20-40K in thumbnails on faster disk and well under 1K in fairly narrow, and not heavily indexed DB rows on 2 tables.

As for listening to a bozo on website...well, HN is by and large not bozo-filled, it was pretty clear you weren't one, had given some thought to this problem, and I'm more than ready to admit when I'm potentially able to learn from someone else something that might save me/my company money.

It does sound like your app has substantial sub-linear cost components, and I admit ours has some smaller ones as well but that we just don't model them tightly enough to see those components.

Thanks for the info, and I wish you the best in your endeavor(s).


Probably low, since there's already at least one low priced service (zooomr) that has no individual filesize limit. Prices of hard disks have been dropping precipitously, so I think there's an expectation out there that the same cost for online services should happen (gmail's steadily increasing quota also reinforces that). I realize those expectations are probably unrealistic.

In terms of bandwidth, a flickr-like display that shows a smaller size by default mitigates that. It's the convenience of uploading a full-size file and getting it resized to multiple sizes (and the full size is there for those who want to see it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: