Lukas Hruby

I help companies ship AI to production — with architecture that lasts, not just a proof of concept.

  • 15+ years in software, 8.5 years as CTO: designed systems from MVP to global production in 102 countries
  • AI to production: real-time inference, LLM integration, MLOps, evaluation, governance — not just PoC
  • Software architecture & scaling: component design, API contracts, observability, cost/performance trade-offs
Lukas Hruby

Northern & Central Europe • Online

I help when

Problems I solve best — because I've solved them before, at scale.

  • You need to integrate AI/LLM into existing systems — auth, data pipelines, APIs, processes. I design architecture that works with what you already have.
  • You want AI that's reliable — evaluation, monitoring, fallback, security guardrails. Not just a demo, but a production system with measurable results.
  • You're dealing with scaling and cost/performance trade-offs — GPU, latency, throughput. I've optimized infrastructure to 10M+ daily requests with 90% cost savings.
  • You need architecture and technical leadership — decision logs, RFCs, standards, code review. I've led a team from 0 to 20 engineers.
  • You have challenges in computer vision / real-time inference — I've processed 1000+ camera streams at sub-second latency with 99%+ accuracy across 102 countries.

Selected Wins

  • Built production ML system processing real-time video from 1000+ cameras with sub-second latency and 99%+ accuracy — used in 102 countries, reduced manual monitoring by 80%
  • Designed reference architecture for AI products — data pipeline → model serving → observability → governance. Systems that run reliably in production, not just in a notebook.
  • Established model evaluation standards — offline eval + online metrics + regression tests. Measurable quality instead of "seems to work".
  • Scaled cloud infrastructure to 10M+ daily API requests with 99.9% uptime — 90% cost savings through GPU workload and architecture optimization
  • Led engineering team growth from 0 to 20 engineers (sped up delivery by 3x, established hiring system, cross-functional processes)
  • Established MLOps practices enabling continuous deployment (reduced time-to-production from weeks to days, 50% faster iteration)

What clients say

Their software is better than any in the marketplace right now.

— Enterprise client, transportation sector

What I deliver

Transparency: I'm a co-founder/ex-CTO of GoodVision, so I'll always disclose any potential conflict and suggest alternatives when appropriate. If an existing product is a better fit than custom development, I'll say so openly.

Not sure which fits? Book a free 30-min call and we'll figure it out together.

AI assistants in practice

Intake assistant for a law firm

Context: Law firm receiving new inquiries from multiple channels — often incomplete, senior staff spending time on initial evaluation.

Problem: Poorly framed questions or insensitive communication drives clients away. Key information missing for initial assessment.

What we delivered: An assistant that identifies case type, asks for missing information (structured but human-like), and prepares a summary and materials for the lawyer. Clear boundaries — the assistant doesn't give legal advice, only collects information.

Result: Up to 80% time savings in evaluating new clients. More consistent intake information, less back-and-forth, better client impression.

Safety: Sensitive data minimization, audit log, role-based access, human escalation on uncertainty

Supplier comparison with assisted process via SMS & WhatsApp

Context: B2B2C model in commodities — many suppliers, many customers, many steps. Users don't want to onboard to another tool.

Problem: Coordination chaos, inconsistent supplier data, communication delays.

What we delivered: A conversational coordinator via SMS/WhatsApp — collects consumer inputs, distributes inquiries to suppliers, tracks process steps, normalizes responses into comparable format. Human intervention only where needed.

Result: Significantly faster turnaround from inquiry to supplier selection. Less manual coordination, higher conversion through familiar channels.

Safety: Input validation, clear rules for customer vs. supplier communication, auditable process

Advisory layer for employee benefits comparison

Context: Benefits comparison tool — users have many preferences but can't translate them into decisions. A data table isn't enough.

Problem: Users see data but don't know "what it means for me". Personalized perspective missing.

What we delivered: An advisory component that creates a clear picture from user inputs and suggests topics worth considering. Recommendations framed as "suggested areas", not hard advice. Transparent "why we recommend this" explanations.

Result: Higher user clarity and confidence. Better engagement — more completed comparisons and higher conversion.

Quality: Continuous tuning on real data and feedback, transparent reasoning

Case Studies

Real-time Computer Vision for CCTV (102 countries)

Context: GoodVision needed real-time video analysis across multiple locations, serving customers in 102 countries.

Problem: Processing 1000+ camera streams with sub-second latency, scaling to handle peak loads, maintaining 99.9% uptime.

What I did: Designed edge processing architecture running on NVIDIA Jetsons, built model serving infrastructure achieving 99%+ detection accuracy, implemented MLOps pipeline for continuous deployment, optimized for GPU economics and edge deployment.

Result: 99%+ accuracy at sub-second latency, 10M+ daily requests handled reliably, cost per stream reduced by 40%.

Stack: AWS, AWS IoT, Docker, NVIDIA Jetson, Jetpack, PyTorch, TensorRT

Cost & performance architecture for GPU workloads

Context: ML workload requiring significant GPU compute with cost and latency constraints.

Problem: Balancing GPU costs, latency requirements, and scalability for variable workloads.

What I did: Architected hybrid cloud solution (on-demand + spot instances), implemented auto-scaling, optimized model inference, established cost monitoring and alerting.

Result: 90% cost reduction while maintaining latency SLAs, automated scaling handled 10x traffic spikes.

Stack: AWS EC2, ECS, CloudWatch, custom cost optimization

Engineering team scaling (0 → 20)

Context: Needed to scale engineering from founding team to support growth across 102 countries.

Problem: Hiring quality engineers, establishing technical culture, building processes for distributed team, maintaining delivery speed.

What I did: Built hiring process and technical interviews, established architecture principles, implemented CI/CD and code review practices, created onboarding system, set up cross-functional collaboration.

Result: Team grew from 0 to 20 engineers across time zones, delivery velocity increased 3x, technical debt managed systematically.

Stack: Hiring processes, technical culture, architecture governance, CI/CD, cross-functional processes

About

15+ years in tech. 8.5 years as CTO & co-founder of GoodVision. Speaker at Stockholm Smart City Expo. MSc from Czech Technical University, Charles University, JKU Linz, and ENSTA ParisTech.

AI & LLM to Production

From use-case identification to deployment: architecture, evaluation, monitoring, guardrails. Not just PoC, but systems that work in production.

Software Architecture & Scaling

Component design, API contracts, observability. Infrastructure handling 10M+ daily requests with 90% cost savings.

Computer Vision & Real-time

Production systems processing real-time video from 1000+ cameras with sub-second latency and 99%+ accuracy across 102 countries.

When to look elsewhere

  • You need a full-time full-stack implementer — my biggest value is direction, architecture, and outcomes. I can lead, but you'll need engineers to build.
  • Success requires writing most of the code — I'll lead strategy and architecture, but we should involve a dev team or agency for implementation.
  • You need deep expertise in a specific framework — I'm not a framework specialist, but I can quickly evaluate and choose the right approach for your problem.

I deliver the most value where business, product, and architecture need to be aligned. Still not sure? Book a free 30-min call — no commitment, we'll figure out if it's a fit.

Ready to talk?

Whether you're integrating AI/LLM, designing production architecture, or need a fractional CTO — let's start with a conversation.

Usually respond within 24 hours.