Hacker Newsnew | past | comments | ask | show | jobs | submit | a-dub's commentslogin

i'm curious: how does the steady state error rate of a stochastic automated system like this compare with the downtime and errors that come from a (brittle) deterministic bridge that can fail with upgrades? what does the observability look like? (i'm guessing one feature is that the execution log including images/screenshots for each transaction gets saved, which is probably a huge improvement.)

it’s a good q - we experimented a lot with computer use / agentic automation and found that at scale a hybrid solution where the automations run as deterministic code with agents for recovery is the best - running automations as code is faster & cheaper & when you’re doing critical tasks (like updating patient records) you don’t want an agent to potentially mess something up.

previously writing RPA code used to take a long time - using AI (and its infinite patience) we can write more durable code that covers more edge cases

And since they’re code based it’s pretty straightforward to an agents monitor them and update their code when upgrades to the underlying system happen etc…

for observability - we have workflow execution logs that store text, videos and screenshots so an agent or a human can debug them - lots and lots of webhooks when things break ! (:


that sounds like the way. also keeps runtime costs down. seems like the trick there would be to build (a|on top of) durable rpa librar(y|ies) that the agents and humans work in so that the automation recipes and their automated updates can be quickly skimmed and sanity checked when needed. add in some live automated testing (assuming you can make this happen with the legacy systems) and maybe you could get really close to fully automating all of it.

I also experimented with vision/screenshot based computer use tools for similar use cases but had inconsistent results. LLMs had trouble getting precise pixel coordinates from a screenshot to move a mouse. And the screenshots took extra tokens. I had a lot more success using accessibility APIs to replace screenshots + input simulation since accessibility data is easier for LLMs to process. The accessibility functionality is now released as a separate library for building automation tooling: https://xa11y.dev/

cool! thank you for sharing - will check it out

this is interesting. would be cool to explore something like integrating a vlm to add a "semantic" term to the loss function. looking through the comparisons, some of the baseline codecs create meaningfully different details (as could be described by text) in the images.

watermarking only really works when the scheme is secret.

putting cyphertext in high frequency noise is old news. in generative land would be far more interesting to use the generative flexibility to encode in macrostructure.


back in the '00s i used a hardware kvm that could be controlled by the keyboard with some weird key combo (~ ~ (1|2)? maybe?). these days i strongly prefer deskflow (oss version of synergy) for this sort of thing or just ordinary remote desktop for the secondary. (depends on the task, if you're just building for the secondary or reading email it doesn't really matter- but if you're developing interactive applications or you need to reboot a bunch or something, then having the physical hardware with a local head can help).


I still use hardware KVMs. Tesmart is OKish, but fails within a couple years, usualy. AV Access is on par with it.

Level1Techs are the best but also cost double or triple.


Where I worked in the 00s, they had rack mountable kvm. The clients were just a small box with utp and the peripherals. A double press on ctrl opened a menu and you could choose a server. Neat.


it was embedded/client land for embedded systems that were connected to tvs (had one of those on my desk too). i had a primary windows dev box for wrs tornado and e-mail and then a linux box for ci dev and build/release infra. the kvm also allowed me to switch to whatever engineering sample hw i had on my desk. fun times!


no placebo arm. result could be due to unisolated factors present in clinical trial fixturing.


hm. surprised there aren't idioms like copy_(to|from)_user for these kinds of kernel to userspace mappings for custom device nodes that ensure bounds are supplied...


> Strong Federal privacy laws would make posts like this unnecessary, that’s the world I’d rather live in.

yes. there ought to be a right to reasonable expectation of behavioral privacy where if it's not obvious and intrinsic to function that behavior is being recorded then it must be consented with functional opt-out.

gps tracking to the manufacturer of a car seems egregious. i wonder if it runs afoul of anti-stalking laws.


back in the days when i was doing windows desktop apps, this was the way. installshield was such a piece of hot enterprise garbage that felt like it was designed by management consultants. overengineered, way too complicated, way too many steps to do the simplest of things. it was literally a case study in how not to do software.


i'm curious about how effective path tracking can be in comparison with computer vision based inverse kinematics of the body itself. do all forms of bad form have detectable imu signatures?

i wonder if it would make sense to consider it as a data problem, capture a bunch of high fidelity inverse kinematics data for various forms of bad form/dangerous lifting along with the imu data and then work from there. there could be some interesting and unexpected features that are easier to detect than straying from straight line paths with some tolerance.


For me it's a bit of an inverse problem. I go to a public gym (hard to sustain motivation at home) and I absolutely don't want to film myself there.


so i'm guessing something like this would be caught by (open\|little)snitch. the raw c2 post coming from the python process would definitely be a red herring, but i wonder how obvious the git/github activity would be. it would seem kinda weird if it came from the python process itself, but if it were just git or gh in a subprocess, it would possibly look totally normal and even have a temporary allow rule in place...

maybe it's time for a nextgen opensnitch where the rules table is replaced by an active agent that watches connections and the process table?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: