Age 14
Left school during 8th grade.
School didn't feel very useful so I left at age 14.
Research Scientist, Stanford Computer Science
AI is doing to knowledge work what machines once did to muscle, compressed from generations into years. The comforting story is that we will all just move up a level of abstraction. I study whether that story is true.

Featured


How I got here
I've always gone my own way, and it's brought me a lot of joy. I've been a founder, operator, athlete, and researcher.
Age 14
School didn't feel very useful so I left at age 14.
Age 14-18
Nobody takes a 14-year-old seriously on a call, so I sold online before e-commerce was a thing.
2011
Skipped high school, scored well on the SAT, and learned by reading Wikipedia.
2013
Became Master of Sport of Russia, a title awarded for national-champion-level athletic results.
2016
Studied Operations Research applied to the physical world: optimization, linear & non-linear programming, queueing theory, and simulation.
2017–2022
Led transformation work touching 2,500 engineers across 25+ countries, with measured productivity, NPS, and EBIT impact.
2022–2025
Spent my days doing research, building things, and sitting in on as many CS classes as I could.
2023-Now
Most of my work now turns research into tools for measuring AI, improving engineering productivity, and understanding how human work is changing.
December 2024
It landed on the cover of The Washington Post's business section, was re-shared by Elon Musk and Marc Andreessen, and was covered by 100+ outlets worldwide.
Now
With data from hundreds of companies.
Awards
Teaching
Courses I Enjoyed
Deep dive
I studied private Git data from more than 50,000 engineers across hundreds of companies. About 9.5% did almost no measurable work, less than one tenth as much as a typical engineer.
The estimate doesn't come from counting commits. A model scores every commit the way a panel of ten expert reviewers would: how hard was the work, how maintainable is it, how much value does it add. Counting commits only catches people who do nothing. Scoring the work catches people who commit a lot of nothing.
The finding made the cover of the Washington Post's Business section, sparked a global debate about remote work and measurement, and was amplified by Elon Musk. The strongest validation came from the companies themselves: when they checked the engineers we flagged, the ghosts were real.
Selected high-signal coverage from a wider set of 100+ outlets worldwide.
Deep dive
We measure what AI does to software output across 100,000 developers at hundreds of companies. For most of the AI boom the answer held: gains that are real, below the sales pitch, and uneven across tasks and teams. In December 2025 the answer started to change.
The same expert-panel model scores every commit on time, quality, maintainability, and complexity, then tracks output as teams adopt AI. Through 2025 the average lift stayed smaller than the headlines, and it depended on the task, the age of the codebase, and how common the language was.
Most companies can't tell whether their AI investment pays off, because their metrics can't see it. That measurement gap, and the distance between teams that master AI and teams that don't, is the throughline of the work.
December 2025 marked an inflection in the data. Through the peak of the hype I kept saying we weren't there yet, and the numbers backed me up. I was right about the call and wrong about the clock: the shift arrived long before I expected it.

The AI-productivity work is now cited and discussed across institutional, enterprise, investor, podcast, and engineering-leadership channels.
15 papers
Shows that language models can agree and still be wrong, so truthfulness needs real verification rather than voting or self-consistency.
Introduces RAMP, a way to score how ready repositories are for coding agents, and links stronger setup with fewer quality problems after adoption.
Studies how duplicated training data affects models differently as they scale, showing that repetition can change performance in ways averages hide.
Revisits V-information and shows it can behave strangely under realistic modeling limits, making it risky to treat as a simple information measure.
Shows that repeating data inside a training set can damage language models, even when the repeated examples look harmless at first.
Explains how generative benchmarks can be contaminated by test data, making models look better than they really are.
Tracks how weak sampling evaluations passed through major papers, turning shaky claims into later assumptions that other work built on.
Rechecks claims that LLM answers collapse into one narrow style, and finds much more diversity across topics, models, and prompts.
Creates a benchmark for testing whether coding agents can produce Lean 4 proofs that verify software end to end.
Tests whether an agent's Lean proof actually matches the original Python code, exposing proofs that are correct but about the wrong thing.
Proposes a competition where researchers predict evaluation results before they are run, making AI benchmarks harder to game after the fact.
Argues that machine learning conferences need a formal place for critiques and corrections, so important mistakes can be reviewed openly.
Measures how much LLM code-review outputs vary across runs, which matters when teams want reliable and repeatable review results.
Reexamines min-p sampling and finds that its claimed benefits are not supported by the available evaluation evidence.
Shows that models can predict expert code-review scores, making it cheaper to estimate code quality across large datasets.
Authored policy analysis and columns translating the research on AI, productivity, language, and remote work for government and public audiences.
Strategic AI policy analysis commissioned for Kazakhstan's official diplomatic channel at the United Nations.
A public case for Kazakhstan's AI opportunity, grounded in productivity data from nearly 100,000 developers across 500+ companies.
Russian-language column on how Kazakhstan can keep its emerging lead in AI-driven software productivity.
Spanish-language analysis of AI, ghost workers, and the changing productivity model in Silicon Valley.
Argument that AI's English-language bias creates a structural productivity disadvantage for Spanish-speaking economies.
Spanish op-ed on remote work, ghost workers, and why better measurement should protect merit rather than become surveillance.
Polish business op-ed on AI productivity gains, rework, language effects, and what local firms need to get right.
Polish-language piece explaining ghost engineers, remote-work structure, and data-driven productivity measurement.
Proof
Coverage, institutional uptake, and where the work is presented.
The AI productivity work now appears in policy, enterprise, and official event surfaces, not only press coverage.
A newer coverage cluster focuses on whether AI actually improves developer productivity and whether firms can prove ROI.
The ghost-engineers finding travelled far beyond U.S. media, with confirmed coverage across Europe, Asia, Latin America, and post-Soviet tech press.








:format(jpg):quality(99):watermark(f.elconfidencial.com/file/bae/eea/fde/baeeeafde1b3229287b0c008f7602058.png,0,275,1)/f.elconfidencial.com/original/4f4/75f/bb0/4f475fbb06e607b2c77a8794134a2a60.jpg)



More confirmed outlets and references
Research reshared by Elon Musk and discussed by Marc Andreessen and Patrick McKenzie, carrying it into wider public debate.










A graduate course on evaluating, benchmarking, and understanding AI systems: predictive measurement, validity and reliability, benchmark design, and governance.
Course page →Off the clock


I've been lifting for most of my life. National champion, Master of Sport, and still under the bar despite a long list of injuries.
It started at 13 with coding bots for video games. Watching the bots play was more fun than playing myself, and I've been automating things ever since.
Experimenting with ways to push the human body beyond its limits. Every limit I've tested so far has moved.
Two wheels and four. Drawn to speed, machines, and the open road. The faster you go, the quieter it gets.
Contact
If you study how work is changing, build in this space, or want to, get in touch.