open to AI / LLM engineering roles

Dhiraj
Chaudhary

AI & Data Engineer at KKR

portfolio — zsh

dhiraj@dev:~$

›

dhiraj@dev:~$ cat about.md

About

I'm an AI and data engineer in KKR's Insurance Data Engineering org, where I grew Core Data Platforms from just me to a team of five. We own the systems the org's data runs on: 10,000+ jobs across ETL, serverless, and the lakehouse.

I shipped the org's first LLM system to production in 2024, before the enterprise sanctioned any AI coding tool. AI is most of my work now: agentic systems that take a task end to end, draft the fix or the answer, and show their work so a person can trust it before it ships.

Outside work, I build and ship my own AI products. It keeps me close to the tools and feeds ideas back into the day job.

ai / llm

agentsmulti-agentLangGraphClaude CodeCursorMCPRAGtool-useevalsprompt engineeringstructured outputsLiteLLMOpenRouterClaudeGPT

data

SparkScalaRDDIceberglakehousemedallionHiveETL

data stores

RedshiftSnowflakeOracleDB2DynamoDB

cloud

AWS LambdaStep FunctionsGlueEMRS3Serverless

languages

PythonSQLScalaTypeScriptBash

web

Next.jsReactFastAPIFlaskTailwind

dhiraj@dev:~$ ls ~/experience

Experience

KKR (Global Atlantic)Sr Software / Data Engineer2022 – present

automated-incident-resolutionai▸

When a production data job fails, this system investigates the way an engineer would: reads the logs, queries the underlying database, checks the job's code and its incident history. Then it triages. Code bug → a drafted GitHub PR. Source-data or platform issue → a drafted email to the owning team, written in that job's templates and language, ready to review and send. The diagnosis and the write-up are done before an engineer opens the incident. Replayed against the last 200 production failures, unfiltered, it classified root cause (code vs. data vs. infra) about 95% of the time.

claude-codelitellmagents

~95% root-cause classification · last 200 real failures

llm-migrationsai▸

LLM pipelines I built in 2024 to power a major technology transformation: 2,000+ Sybase stored procedures to Redshift, 500 SAS jobs to Spark, ~1,200 Tableau dashboards repointed and rewritten. This shipped over a year before the enterprise sanctioned any AI coding tool. The hard part was validation: an LLM compared rendered dashboard images, old source vs. new, and reported every visual difference, catching bugs and drift no one had time to spot by eye. Migrations budgeted in months landed in days, saving roughly $1M in labor.

llmredshiftspark

months → days · ~$1M saved

spark-etl-frameworkplatform▸

The Insurance Data Engineering org's standard tool for building data pipelines: 50+ plug-in components (read a database, write a database, create a CSV, send an email, and dozens more) that let an engineer assemble a complex pipeline with almost no bespoke code. Shared logging, incident creation, and central control mean a fix made once propagates to every job that uses the component. 10,000+ jobs run on it today.

sparkscalaplatform

10k+ jobs · key runtimes ~80% faster vs. pre-tuning

tpa-drift-resolverai▸

Insurance partners hand-build their data files in Excel, so columns, types, and date formats drift, and pipelines that expect an exact schema break. This system catches the drift, has an LLM propose the mapping fix as a reviewable diff, and routes it through business, TPA, and support approval. One click later, the file ingests. No code change, no asking the partner to resend. One of the first LLM systems I shipped to production, in 2024, and teams keep onboarding new feeds to it: it now clears ~120 files a month across dozens of deals.

2–3 days → same day · ~120 files/mo · growing

data-quality-agentai▸

Writing good data-quality rules takes someone who knows both the data and the domain, and that expertise doesn't scale, so coverage stayed thin and uneven across most tables. This system fixes it in two layers. First, an agent profiles a table and drafts ~20–30 tailored DQ rules plus exploratory queries (counts, averages, ranges over the columns that matter), each test-run against real data; a reviewer who knows the table then tunes, adds, and prunes them before approving. Then, daily, an LLM reads the full results the way an analyst would: the kind of read that catches a batch of policies whose premium and coverage quietly stopped tracking each other, even though every value is individually valid and no single rule fires. Each run ships a plain-English report plus a JSON summary, ready to route into alerts and dashboards.

POC · ~15 tables · ~20–30 rules/table, agent-drafted + human-tuned

deal-codegenai▸

Every new insurance deal needs 15–20 pipeline modules built against the same canonical medallion model. The structure rhymes across deals but never repeats exactly: some are close cousins, others diverge in shape and source, so each one still took weeks of real engineering, not copy-paste. I built this in 2024, before the enterprise sanctioned any AI coding tool, to generate most of those modules, grounded in prior deals' actual code, with a human reviewing before integration. Integration time dropped roughly 60%. When Claude Code and Cursor later caught up, I retired the pipeline and distilled its grounding into skills, so the system is gone but the knowledge still generates the code.

weeks → days (~−60%) · 2024 · retired into Claude Code skills

text2sqlai▸

The Insurance Data Engineering org's first Text2SQL system (natural language to SQL), built in 2024. Plain-English questions become SQL grounded in real schemas, column context, and curated sample queries, so output matches actual table structure instead of plausible guesses. It proved the pattern worked on the enterprise's own data, and when the org invested in a production tool, this work was merged into it rather than replaced. Its descendant runs across the org today.

2024 · org's first · merged into the org's production tool

doc-translationai▸

Translating business documents through an outside vendor or API meant a fee per file, slower turnaround, and a third party in the loop every time. So we built it in-house. The pipeline pulls every text and layout element out of a PDF or PPTX, translates the text through open-source models running inside the VPC, then reassembles a document that looks identical to the source. A round-trip eval loop checks that meaning survived, not just grammar. It shipped to production: an on-demand way for teams to work across a language barrier, with the data never leaving our environment.

EN ↔ JA · zero data leaves the VPC · in production

json-ingestionplatform▸

An existing JSON ingestion pipeline that swallowed 300–400k tiny files per load was taking 17–18 hours, sometimes a full day. I rebuilt the slow parts: an RDD used as a distributed work queue to move files (5 hours down to 2 minutes), plus RDD-level normalization that fixed list-vs-dict key drift before the DataFrame ever existed. Today 95% of runs finish in 10–15 minutes.

17–18 hrs → 10–15 min · 300k+ files/load

serverless-platformplatform▸

One of the lead architects of the Insurance Data Engineering org's move off always-on EC2 onto Glue, Lambda, and EMR. I designed the architecture, proved feasibility, built the v0, then helped teams across the org migrate onto it, building custom tooling where AWS fell short. Automation utilities saved 8,000+ engineering hours, and rule-based alerting cut error-detection time ~40%.

8k+ engineering hrs saved · detection time −40%

data-fabricplatform▸

A single, central home for the Insurance Data Engineering org's data. Instead of scattered silos, every source database (Oracle, Snowflake, DB2, Redshift) lands in one unified, cost-efficient store on Apache Iceberg, with centralized control over all of it. That's the foundation that makes AI, ML, and analytics across the org possible: one place to store, govern, and query. 500+ tables consolidated so far, with thousands more to come.

500+ tables · 4 teams · one central store, scaling to thousands

variance-trackerplatform▸

A full-stack web app for tracking and signing off data variances across multiple databases, built solo in Flask and shipped to AWS after passing the enterprise Architecture Review Board. 50+ stakeholders rely on it, and it cut the variance approval cycle by ~70%.

50+ stakeholders · approval cycle −70%

TarificaSoftware Engineer Intern2021

Rebuilt and maintained 100+ Python telecom-data scrapers, lifting data quality ~25%.
Tuned the ETL pipeline and built Flask REST APIs that sped up reporting for analysts and clients.

dhiraj@dev:~$ ls ~/projects

Projects

AI products and tools I build outside work. Most are live on the internet.

Get New Resumelive▸

Most AI resume tools 'help' by inventing experience you don't have. This one won't. A resume and a job description go in; a tailored resume, an ATS match score, and a cover letter come out, under a hard zero-fabrication constraint, on a 4-stage multi-model LLM pipeline with typed, validated outputs.

Next.jsLambdaOpenRouterDynamoDB

getnewresume.com

TimeBrewlive▸

Personalized AI news briefings in a 'Morning Brew' voice. A 3-stage Step Functions pipeline (curator → editor → dispatcher) across 14 Lambdas, with timezone-aware scheduling, at about $0.02 per briefing.

Step FunctionsPerplexityGPT-4SES

timebrew.news

InspireInboxlive▸

A personal-growth companion that emails you motivation matched to your goals, then learns from what lands. You set the goals; it generates and schedules content toward them, collects feedback on each send, and tunes the next ones from that signal.

FastAPIReact

inspireinbox.com

Notion Craftslive▸

Notion can't show a live clock or a running timer on its own. This fills the gap: a web app of embeddable widgets (clocks, timers, counters) and icon packs you drop straight into a page, on a Python/Flask backend on EC2 behind Nginx and Cloudflare.

FlaskEC2Nginx

notioncrafts.com

jottedwip▸

A keyboard-first 'mission control' for running many AI agents at once: drag-and-drop task lanes per agent, a Cmd-K command palette, and timeline, radar, and kanban views over your runs. Built it to manage my own parallel Claude runs.

Next.jsZustandTypeScript

dhiraj@dev:~$ ls ~/research

Research

Undergrad work, where the math and CS foundation under everything else started.

Flood-It solverpresented at the Mathematical Association of America

Designed and implemented a Python algorithm that predicts the minimum number of moves needed to flood the grid in the Flood-It game.

rb.gy/lnifd ↗

Bio-roboticspresented at the SJC Undergraduate Research Symposium

Combined artificial and biological structures into a hybrid robotic design that outperforms conventional single-structure robots.

rb.gy/8j3j9 ↗

dhiraj@dev:~$ ./contact.sh

Get in touch

Open to AI / LLM engineering roles, and always up for a hard problem.

↵ dhirajc963@gmail.com

$ cat contact.json

{
  "email": "dhirajc963@gmail.com",
  "github": "github.com/dhirajcdry",
  "linkedin": "linkedin.com/in/dhiraj-kumarcdry",
  "location": "New York City"
}