The ideaWhy Claude?How it startedWhat the data looks likeBuilding the extraction toolThe file naming problemSample logThe scriptFirst resultsAbout the costsModel cost vs performanceBonus: syntax highlightingNext steps
This is the first article of a series about using AI efficiently, with a focus on actionable and practical advice backed by real data. In this first installment we'll look at how to actually measure the time you gain (or lose) when working with AI, using real session logs from actual work.
The twist?
This article is partially being written1 by the very AI system it's documenting, while working on the tools to measure its own productivity.
Meta enough for you?
The idea
If there's one thing missing from most discussions about AI productivity, it's actual numbers. People say "AI makes me 10x faster" or "AI is useless", but rarely back it up with data.My approach is simple: I've been using Claude Code as a programming assistant for several weeks across multiple projects (a retro game, websites, a game jam, a SDK), and all of this work is logged with timestamps down to the second.
If we can extract and analyze these logs, we can measure exactly how long tasks took, what the AI did versus what I did, and whether it actually saved time.
Why Claude?
A fair question before we go further: why am I using Claude specifically?The honest answer is that my employer got us a Claude subscription, and I've been using it on my personal projects to experiment, discover, and learn how to use it more efficiently. What I learn on evenings and weekends, I share with my colleagues and boss — so everybody benefits.
I have no experience with Google Gemini, OpenAI's coding tools, JetBrains Junie, or other AI coding assistants. So take everything in this article with a grain of salt — the workflows, timings, and conclusions may not apply one-to-one to whatever system you're using. That said, the general approach of measuring your own productivity with session logs should work regardless of the tool.
How it started
Here's the fun part: the process of building the measurement tool is itself an example of AI-assisted work, so let me walk you through exactly what happened.I opened a conversation with Claude and explained I wanted to create a series of articles about using AI efficiently. I mentioned that I had an existing session log from a game jam (the S.C.R.A.P. project, where I spent an entire Saturday building a PICO-8 game with Claude's help) and that I wanted a reusable tool to generate similar logs from all my AI sessions.
Here is the exact prompt I gave Claude:
If you go to the S.C.R.A.P. session log you'll see a very long session log of me working with Claude Opus on a game jam on a Saturday a couple weeks ago.
This session log was actually generated by Claude from the artifacts located in the .claude installation folder, by analyzing the format to figure out how these sessions were done, and then creating a .md file from it.
What I would be interested in is if we could make such a "script" (or command, or skill or whatever), ideally a reusable one, that can go through all these files and create session logs for what I've done, so I can explain how I did this or that.
Ideally these should be time stamped and named in a way that identifies the type of work, like what we are doing here is related to the blog, but yesterday I did some work on the OSDK, the day before on the Defence Force main site, some days it was work on my Encounter game, etc...
The logs will probably contain a lot of noise, but that can be filtered out later.
Do you think you could help with that?
This session log was actually generated by Claude from the artifacts located in the .claude installation folder, by analyzing the format to figure out how these sessions were done, and then creating a .md file from it.
What I would be interested in is if we could make such a "script" (or command, or skill or whatever), ideally a reusable one, that can go through all these files and create session logs for what I've done, so I can explain how I did this or that.
Ideally these should be time stamped and named in a way that identifies the type of work, like what we are doing here is related to the blog, but yesterday I did some work on the OSDK, the day before on the Defence Force main site, some days it was work on my Encounter game, etc...
The logs will probably contain a lot of noise, but that can be filtered out later.
Do you think you could help with that?
Within a couple of minutes of back and forth, Claude had:
- Explored the .claude folder structure on my machine to find where session data is stored
- Identified the .jsonl format used for session logs (one JSON object per line, with timestamps, message types, and content)
- Mapped out all my project folders (blog, OSDK, Encounter, Defence Force, SCRAP, GameLauncher...)
- Analyzed the content types in assistant messages (text, thinking, tool_use)
- Proposed a plan for a Python script to extract and format everything
The whole research phase took about 5 minutes. No documentation was consulted, no API reference was needed — Claude simply looked at the files, figured out the format, and proposed a solution.
What the data looks like
Claude Code stores its session data in ~/.claude/projects/, organized by project directory. Each project folder contains .jsonl files (one per session), where each line is a JSON object with:- A type field: user, assistant, queue-operation, file-history-snapshot, or progress
- A timestamp in ISO 8601 format (precise to the millisecond)
- A message object containing the actual conversation content
- Metadata like the working directory, session ID, and git branch
For assistant messages, the content can include:
- text — what Claude says to you
- thinking — internal reasoning (the "thought process")
- tool_use — actions like reading files, editing code, running commands
Here is what a typical user message entry looks like:
{
"type": "user",
"timestamp": "2026-03-07T08:15:50.203Z",
"sessionId": "3625fb64...",
"message": {
"role": "user",
"content": [
{
"type": "text",
"text": "So today I need your help to work on a GameJam..."
}
]
},
"cwd": "d:\\Git\\pico8\\scrap",
"gitBranch": "HEAD"
}And the corresponding assistant response:
{
"type": "assistant",
"timestamp": "2026-03-07T08:16:09.456Z",
"message": {
"model": "claude-opus-4-6",
"role": "assistant",
"content": [
{ "type": "thinking", "thinking": "(internal reasoning)" },
{ "type": "text", "text": "This is a solid concept..." },
{ "type": "tool_use", "name": "Read", "input": "..." }
],
"usage": {
"input_tokens": 3,
"output_tokens": 623,
"cache_read_input_tokens": 5761
}
}
}Notice the usage block — every assistant response includes token counts, which means we can compute exact costs for every single interaction.
Here's a quick summary of the data I've accumulated:
| Project | Size | Description |
|---|---|---|
| Encounter | 288 MB | Retro game for the Oric |
| SCRAP | 23 MB | PICO-8 game jam (one Saturday) |
| OSDK | 18 MB | Oric SDK |
| Defence Force | 7.1 MB | Main website |
| Blog | 5.1 MB | This very blog |
| OSDK Website | 4.4 MB | OSDK documentation site |
| GameLauncher | 1.7 MB | Game launcher tool |
That's a lot of timestamped data to work with.
Building the extraction tool
The next step was to build a script that can parse all of this data and generate readable session logs. I asked Claude to write a Python script with the following requirements:- Scan all project folders under ~/.claude/projects/
- Parse each .jsonl session file
- Generate a timestamped markdown log per session, grouped by project
- Include user messages, Claude's responses, and summaries of tool usage
- Filter out noise (queue operations, snapshots, internal bookkeeping)
- Estimate token costs based on current Anthropic pricing
- Name the output files meaningfully based on the session's topic
That last point turned out to be an interesting challenge on its own.
The file naming problem
The first version of the script just used session IDs in the filenames, like 2026-03-07_SCRAP_3625fb64.md. Not very helpful when you're browsing through 28 session logs.So we added topic extraction: the script looks at the first meaningful user message and uses it to name the file. Simple, right? Not quite.
The first attempt just grabbed the first few words, giving us gems like id-like-to-add-a-bit.md — because the message started with "I'd like to add a bit more information to the support email...". The important part was "more information to the support email", not "I'd like to".
What followed was an iterative refinement process:
- Strip preamble — remove common openers like "I'd like to", "Could you please", "So today I need your help to" to get to the actual intent
- Handle plan sessions — when the message is "Implement the following plan:", grab the plan's title heading instead
- Strip URLs and paths — a message like "Check Chema's post here https://forum.defence-force.org/..." should keep the "Check Chema's post" part, not waste the filename on the URL
- Increase slug length — Windows supports 256-character filenames, so we went from 40 to 80 character slugs for better readability
- Skip empty sessions — sessions with fewer than 5 entries are probably abandoned attempts and get filtered out
The result:
Not perfect — natural language is messy and sometimes the real intent is buried in the second or third message — but good enough to quickly find what you're looking for.
Sample log
Each generated session log starts with a summary table showing the duration, model used, number of messages and tool uses, token counts, and an estimated cost. Then comes the full conversation log with timestamps.Here's an example from a quick bug fix session on the GameLauncher project:
# GameLauncher — Session Log (2026-03-18)
Session ID: cab2ad53...
## Session Summary
| Metric | Value |
|---|---|
| Duration | 5m 39s |
| Model(s) | claude-opus-4-6 |
| User messages | 6 |
| Assistant messages | 17 |
| Tool uses | 7 |
| Estimated cost | $1.45 |
## Conversation Log
**[11:28:06] User:**
hi
**[11:28:36] User:**
In the dialog there's a link to call the user manual,
it properly shows the english and french pages,
but when selecting norwegian it opens the english one
**[11:28:47] Claude:**
The bug is clear. For HyperlinkManual the code only checks
for French — Norwegian falls through to the default English URL.
**[11:28:54]** Tools: Edit: Settings.cpp5 minutes and 39 seconds to find and fix a localization bug. That's the kind of data point we'll be analyzing.
The script
The full Python script is available here: generate_session_logs.pyIt requires Python 3 and no external dependencies. Usage:
# Process all projects
python generate_session_logs.py
# Filter by project name
python generate_session_logs.py --project SCRAP
# List available projects and sessions
python generate_session_logs.py --list
# Custom output directory
python generate_session_logs.py --output ~/my-logsYou'll want to customize the PROJECT_NAMES dictionary at the top of the script to give friendly names to your own project folders.
First results
Running the script on all my sessions produced 28 logs across 10 projects. Here are some concrete numbers from real tasks:| Task | Active time | Idle time | Messages | Cost | ||
|---|---|---|---|---|---|---|
| User | AI | |||||
| GameLauncher | Fix Norwegian manual link | 5m 9s | 0s | 6 | 17 | $1.45 |
| Add system info to support emails | 25m | 40m | 13 | 71 | $2.01 | |
| OSDK | Fix Link65 character replace bug | 4m 29s | 0s | 3 | 32 | $2.24 |
| Fix bas2tap FOR loop label bug + sample | 6m 51s | 40m | 4 | 37 | $6.31 | |
| Blog | Gitignore, slugs, RSS, OpenGraph | 1h 28m | 3h 15m | 52 | 209 | $30.94 |
| PICO-8 | Full game jam: S.C.R.A.P. | 6h 9m | 4h 58m | 164 | 1357 | $230.17 |
The active time is computed by summing the time between consecutive messages, but excluding any gap longer than 5 minutes (which likely means I was away, testing something, or taking a break). The idle time is the sum of those excluded gaps.
For example the bas2tap session shows 46 minutes wall-clock, but only about 7 minutes of actual interaction — the rest was me doing something else before coming back with the actual request. Similarly in the SCRAP game jam, there's a gap from 17:36 to 18:59 — that's when I was having dinner and watching TV, clearly not programming. Without filtering that out, the session would look almost twice as long as it really was.
A few observations:
- Quick bug fixes are incredibly fast — the Norwegian manual link was found and fixed in about 5 minutes, the Link65 bug in under 5. These are cases where Claude can read the code, spot the issue, and apply the fix with almost no back-and-forth.
- Feature additions scale reasonably — adding system information to support emails took 25 minutes2 of active work for 13 interactions.
- Large tasks are genuinely productive — the full game jam produced a complete, playable game in about 6 hours of active work over a single Saturday, including an intro screen, attract mode, configurable controls, a tutorial, high scores, and sound effects. Would that have been possible without AI? Probably not.
- The ratio of user to AI messages is telling — in most sessions, Claude sends 5-10x more messages than the user. Each user message triggers a burst of research, code reading, editing, and testing.
About the costs
You may have noticed the "Cost" column in the table above. Those numbers look scary — $230 for a game jam? $31 for some blog work? But there's an important nuance: these are theoretical API costs, calculated from Anthropic's published per-token pricing. The script counts every input and output token and applies the current rates.In practice, I've been doing all this work on the Claude Max subscription at $100/month, which is a flat rate regardless of how much you use it. So those $230 worth of API calls for the game jam were effectively included in my monthly subscription. The game jam alone would have cost more than two months of the subscription at API rates — which shows the subscription can be excellent value for heavy users.
Here's my experience with the different plans:
- Free plan — fine for testing the product, but useless for actual work. You'll hit the limits almost immediately.
- Pro plan ($20/month) — not sufficient if you want to work uninterrupted with the best model (Opus). You'll get rate-limited in the middle of tasks, which is frustrating and breaks your flow.
- Max plan ($100/month) — this is the one to go for if you plan to use AI as a first-class citizen in your workflow. The significantly higher usage limits for Opus mean I no longer hit rate limits during work sessions.
Model cost vs performance
Another cost-related topic worth discussing: should you save money by using cheaper, less capable models?My experience across multiple AI systems (Claude Opus, Claude Sonnet, and GLM via z.ai) is clear: the top model is worth the price.
Sonnet is roughly equivalent to the best GLM model — both are competent for straightforward tasks. But when things get complicated, the difference is dramatic. At work I had a C# project that stopped compiling after some AI-assisted changes. Both Sonnet and GLM kept going in circles, unable to figure out what was wrong. Opus found the problem in 30 seconds and fixed it.
My take: unless you genuinely cannot afford the top model, working with cheaper models is a false economy. Yes, each request costs less, but at some point you will burn through many requests of your better model trying to fix the mess created by the cheaper one. The cheapest model per request is not necessarily the cheapest overall if it ends up costing you hours of debugging.
Bonus: syntax highlighting
While writing this article, we ran into a side quest: the blog's code syntax highlighting was still using Google Code Prettify from 2013. It didn't know what JSON was (everything showed up green), and it certainly didn't support 6502 or 68000 assembly language.So we replaced it with Prism.js (version 1.30.0), added support for all the languages used across the blog and the OSDK website (C, C++, JSON, JavaScript, PHP, BASIC, Python, Lua, Bash, 6502 assembly), and wrote a custom language definition for Motorola 68000 assembly (download here) — because apparently no syntax highlighting library in 2026 supports the 68000 out of the box.
That's another example of AI-assisted productivity: a detour that could have taken an afternoon was done in about 15 minutes as part of a larger conversation.
Next steps
This is the first article of the series, and there will be more — but I don't have a fixed plan for what comes next. There's a lot of data to explore and many angles to cover: time analysis, cost breakdowns, specific case studies, tips and tricks...If there's anything in particular you'd like me to dig into, feel free to let me know in the comments!
1. Basically I prompted Claude and it wrote, updated, corrected, etc... the article based on my feedback.↩
2. Most of that time was spent on a tricky edge case: my Windows is set to English but I use a Norwegian keyboard layout, and the initial API calls only reported the OS language, not the keyboard language. That required several rounds of back-and-forth and testing before it worked properly.↩

Measuring AI Productivity With Actual Numbers
