Senior AI Pipeline Engineer (Python, AWS, LLMs) – Document Intelligence / CLM

Bangalore, Karnataka, India
Temporary
Remote
2,000-4,300 EUR / Month

Job Description:

About Inhubber

Inhubber is a security-first, AI-powered Contract Lifecycle Management (CLM) platform built for organizations with high compliance and data protection requirements. We combine end-to-end encryption and modern cloud architecture with advanced AI to extract, analyze, and generate contract intelligence.

Our platform processes sensitive legal documents for companies worldwide. We are now scaling our AI capabilities and looking for a senior engineer to take ownership of our production AI pipelines and help build the next generation of document intelligence.

The Role

We are looking for a hands-on Senior AI Pipeline Engineer (Python) who will:

Own and optimize our production extraction pipelines.
Deliver new document-analysis pipelines end-to-end.
Build the foundation for next-generation GenAI features (agentic contract drafting & interpretation).

This is a delivery-focused role. You will own the AI kernel (prompts, evaluation logic, structured outputs, validation, model/tool logic), while product engineers orchestrate workflows via stable APIs.

What Youll Do

1. Own & Extend Production Pipelines

Maintain and optimize our Python-based extraction pipelines (AWS Lambda + S3 + Docker components).
Ensure stable document processing and downstream triggering.
Improve observability: logging, metrics, alerting, traceability, cost monitoring.
Debug and stabilize real-world failure modes in production.

2. Deliver New Document Pipelines

Design and implement end-to-end pipelines for new document families.
Build evaluation datasets and regression tests.
Prevent silent quality degradation through measurable metrics.

3. LLM-Based Interpretation & Structured Extraction

Improve Q&A and structured extraction using LLMs.
Implement structured outputs, retrieval (RAG where useful), and deterministic validation.
Add robust failure handling (timeouts, retries, fallbacks, safe defaults).

4. GenAI Foundations

Build agentic building blocks in Python behind stable APIs.
Contribute to a contract-generation/editing kernel (planner, drafter, risk checks).
Collaborate with backend/frontend teams for clean integration.

5. Production Readiness

Ensure scalability, cost-efficiency, and security.
Contribute to deployment/versioning/rollback strategies.
Help define operational runbooks.

Must-Have Skills

Strong production-grade Python (clean architecture, testing, packaging, APIs).
Experience owning code in production.
AWS serverless (Lambda + S3 required; Step Functions/SQS/CloudWatch a plus).
Docker and containerized services.
Proven experience maintaining/debugging automated pipelines.
Hands-on experience with LLMs (OpenAI/Azure OpenAI/Anthropic or similar):
- Structured outputs
- Prompt iteration
- Retrieval (RAG)
- Evaluation approaches

Nice to Have

Document AI experience (OCR, layout extraction, noisy PDFs).
Evaluation-driven development (test sets, regression checks, quality metrics).
Experience with cost/latency budgeting.
Familiarity with TypeScript/Node.
Experience integrating REST services.

First 90 Days – What Success Looks Like

Weeks 1–2

Fully understand current pipeline architecture.
Stabilize staging/local environments.
Define quality, cost, and latency baselines.
Improve logging and monitoring.

Weeks 3–6

Deliver a new production-ready pipeline for a new document family.
Implement evaluation datasets and regression checks.
Deploy with monitoring and rollback strategy.

Weeks 7–12

Improve Q&A/extraction accuracy measurably.
Deliver a first version of a GenAI contract drafting kernel.
Harden operations (cost controls, retries, fallbacks, documentation).

Tech Environment

Frontend: React (TypeScript)
Backend: Java (JEE)
AI Pipelines: Python (AWS Lambda), Dockerized OCR/NLP
Storage: AWS S3
LLMs: Azure-hosted LLM APIs
Infrastructure: AWS + Azure (hybrid)

Engagement Details

Senior-level role
Long-term collaboration preferred
Strong ownership mindset required
Experience in regulated/sensitive data environments is a strong plus