Technical Blog

Writings on MLOps, machine learning, software engineering, and programming.

For personal writings on Afghanistan, books, languages, and life, see the personal blog.

Is this data actually good, or does it just look good?

agentic-rl

isafpr

datasets

datavalidation

huggingface

Before trusting my ISAF press-release dataset to train and evaluate an RL model, I audited its gold labels and found misspelled provinces, ambiguous values where ‘unknown’ is the honest answer, and a subtle train/test leak. I cleaned it without overwriting the original.

2026-06-28

My first RL environment: three stages, no trainer

agents

llms

reinforcement-learning

agentic-rl

isafpr

My first hands-on RL day. Before any weights move, three ways to shape good and bad structured-extraction traces — filter, reward-weight (with a surprise softmax-and-temperature detour), or push away with full RL — then my first verifiers environment: a dataset plus a rubric, no trainer.

2026-06-20

How to read an RL framework without believing its README

agents

llms

reinforcement-learning

agentic-rl

The RL tooling space grows weekly. Rather than memorise frameworks, I read each one against a five-stage mental model — and stay skeptical of what its README claims to do.

2026-06-19

GRPO without the maths: how an RL trainer nudges the weights

agents

llms

reinforcement-learning

isafpr

agentic-rl

A plain-language walk through Group Relative Policy Optimisation — baseline, advantage, and how a group of attempts becomes one weight update — with a runnable toy loop.

2026-06-18

Reward design for RL, grounded in a structured-extraction task

agents

llms

reinforcement-learning

isafpr

agentic-rl

Grounding RL vocabulary — trace, task, environment — and the three families of reward in a real structured-extraction task. With a dirty-gold-label gotcha that argues for looking at your data, composing rewards via gates and weights, and reward-hacking examples from Shopify’s Sidekick.

2026-06-17

The Five Stages of Reinforcement Learning

agents

llms

reinforcement-learning

agentic-rl

The five stages of a reinforcement-learning setup for LLM agents — tasks, harness, rollout, reward, trainer — mapped onto my own structured-extraction work, and why reward design is the part that interests me most.

2026-06-16

Off the frontier API: distillation, graded SFT, and RL for agents

agents

llms

reinforcement-learning

agentic-rl

Why and when you’d train your own model instead of paying for a frontier API: a practitioner’s primer on distillation, graded SFT, and reinforcement learning for agents. I also get into some of the costs of this approach.

2026-06-13

Trying to instrument an agentic app with Arize Phoenix and litellm

llms

agents

evals-course

evaluation

miniproject

hinbox

Trying to get Phoenix to work with litellm to instrument my LLM calls, grouping spans together as traces.

2025-06-04

Testing out instrumenting LLM tracing for litellm with Braintrust and Langfuse

llms

agents

evals-course

evaluation

miniproject

hinbox

Third time’s a charm: setting up instrumentation with Braintrust, Langfuse and litellm. Braintrust ended up not being as ergonomic as Langfuse so I switch over midway.

2025-06-04

Building hinbox: An agentic research tool for historical document analysis

llms

agents

evals-course

evaluation

miniproject

hinbox

research

Lessons learned from working on an entity extraction system for historical research that automatically processes documents to create structured knowledge databases, developed as a practical testbed for systematic AI evaluation techniques.

2025-05-30

Error analysis to find failure modes

evals-course

llms

llmops

evaluation

A systematic 5-step process for analysing LLM application failures through error analysis and clustering techniques to identify and categorise failure modes for iterative improvement.

2025-05-23

How to think about evals

evals-course

llms

llmops

evaluation

Key insights from the first session of the Hamel/Shreya AI Evals course, focusing on a ‘three gulfs’ mental model (specification, generalisation, and comprehension) for LLM application development and the importance of systematic evaluation and improvement processes.

2025-05-20

First impressions of the new Gemini Deep Research (with 2.5 Pro)

agents

google

tools

openai

research

Some initial fast impressions of Google Deepmind’s new iteration of Gemini Deep Research that uses their 2.5 Pro model.

2025-04-09

Learnings from a week of building with local LLMs

claude

llm

llms

miniproject

openai

prompt-engineering

softwareengineering

tools

Insights from a week of building an LLM-based knowledge database, highlighting experiences with local models, prompt engineering patterns, development tools like Ollama and RepoPrompt, and software engineering principles that enhance AI-assisted development workflows.

2025-03-16

Building an MCP Server for Beeminder: Connecting AI Assistants to Personal Data

tools

anthropic

claude

miniproject

I built a Model Context Protocol (MCP) server for Beeminder to connect AI assistants with my personal goal tracking data. Here’s how I implemented this integration using Claude Desktop, what I learned about MCP development.

2025-02-21

Tinbox: an LLM-based document translation tool

translation

llm

llms

languages

research

miniproject

python

tools

Explores an open-source tool I built that tackles the challenges of large-scale document translation using LLMs. Born from my experience as both a historian working with Afghan primary sources and a developer, it offers innovative solutions to common translation problems through smart chunking algorithms and local model support, making multilingual content more accessible for researchers and developers alike.

2025-02-16

Starting the Hugging Face Agents course

agents

huggingface

skillbuilding

llmops

llms

Some observations on completing unit one of the new course hosted by Hugging Face.

2025-02-11

AI Engineering Architecture and User Feedback

books-i-read

llm

llms

llmops

evaluation

My notes on chapter 10 of Chip Huyen’s ‘AI Engineering’, an exploration of modern AI system architecture patterns and user feedback mechanisms, covering the evolution from simple API integrations to complex agent-based systems, including practical implementations of RAG, guardrails, caching strategies, and systematic approaches to gathering and utilizing user feedback for continuous improvement.

2025-02-09

Notes on ‘AI Engineering’ chapter 9: Inference Optimisation

books-i-read

inference

llm

llms

hardware

Chapter 9 is a guide to ML inference optimization covering compute and memory bottlenecks, performance metrics, and practical implementation strategies. This technical summary explores model-level, hardware-level, and service-level optimizations, with detailed explanations of batching strategies, parallelism approaches, and attention mechanisms - essential knowledge for ML engineers working to reduce inference costs and improve system performance.

2025-02-07

Dataset Engineering: The Art and Science of Data Preparation

books-i-read

datasets

datalabelling

llm

llms

finetuning

Explores Chapter 8 of Chip Huyen’s ‘AI Engineering,’ examining the intricate landscape of dataset engineering through the lenses of curation, augmentation, and processing.

2025-02-05

Notes on ‘AI Engineering’ (Chip Huyen) chapter 7: Finetuning

books-i-read

finetuning

llm

llms

Explores when and how to implement finetuning effectively, looking at key technical aspects like memory considerations and PEFT, while emphasising fine-tuning as a last-resort approach after simpler solutions like prompt engineering and RAG have been exhausted.

2025-01-26

Notes on ‘AI Engineering’ (Chip Huyen) chapter 6

books-i-read

llm

llms

agents

rag

evaluation

This chapter was all about RAG and agents. It’s only 50 pages, so clearly there’s only so much of the details she can get into, but it was pretty good nonetheless and there…

2025-01-24

Notes on ‘AI Engineering’ (Chip Huyen) chapter 4

books-i-read

llm

llms

evaluation

A comprehensive guide to AI system evaluation, synthesising Chapter 4 of Chip Huyen’s ‘AI Engineering.’ These notes detail practical frameworks for assessing AI models, covering evaluation criteria, model selection strategies, and pipeline implementation, while maintaining a balanced perspective between academic rigour and real-world application needs.

2025-01-22

Notes on ‘AI Engineering’ (Chip Huyen) chapter 3

books-i-read

llm

llms

evaluation

Really enjoyed this chapter. My tidied notes from my readings follow below. 150 pages in and we’re starting to get to the good stuff :)

2025-01-21

Notes on ‘AI Engineering’ (Chip Huyen) chapter 1

books-i-read

llm

llms

finetuning

prompt-engineering

A detailed analysis of Chapter 1 from Chip Huyen’s ‘AI Engineering’ book, covering the transition from ML Engineering to AI Engineering, the three-layer AI stack, and modern development paradigms. Includes insights from a study group discussion on enterprise adoption challenges and emerging evaluation techniques.

2025-01-19

Final notes on ‘Prompt Engineering for LLMs’

llm

prompt-engineering

books-i-read

evaluation

Detailed notes covering Chapters 10 and 11 of ‘Prompt Engineering for LLMs’ by Berryman and Ziegler, focusing on LLM application evaluation and future trends. Chapter 10 explores comprehensive testing frameworks including offline example suites and online AB testing, while Chapter 11 discusses multimodality, user interfaces, and core principles for effective prompt engineering. Includes personal insights on the book’s emphasis on completion models versus chat models.

2025-01-17

Assembling the Prompt: Notes on ‘Prompt Engineering for LLMs’ ch 6

llm

prompt-engineering

books-i-read

A detailed breakdown of Chapter 6 from ‘Prompt Engineering for LLMs,’ examining prompt structure, document types, and optimization strategies for effective prompt engineering, with practical tips on information positioning and context selection within prompts.

2025-01-13

Prompt Content: Notes on ‘Prompt Engineering for LLMs’ ch 5

llm

prompt-engineering

books-i-read

RAG

Chapter 5 of ‘Prompt Engineering for LLMs’ explores static content (fixed instructions and few-shot examples) versus dynamic content (runtime-assembled context like RAG) in prompts, offering tactical guidance on implementation choices, tradeoffs, and potential pitfalls while emphasising practical examples throughout.

2025-01-12

Starting to read Prompt Engineering for LLMs

llm

prompt-engineering

books-i-read

tokenisation

Summary notes from the first two chapters of ‘Prompt Engineering for LLMs’.

2025-01-09

All the things I learned while trending on Hacker News

llms

miniproject

finetuning

isafpr

evaluation

nlp

I was on the front page of Hacker News for my two last blog posts and I learned various things forom the discussion and scrutiny of my approach to evaluating my finetuned LLMs.

2024-07-07

My finetuned models beat OpenAI’s GPT-4

nlp

afghanistan

llms

miniproject

finetuning

isafpr

evaluation

Finetunes of Mistral, Llama3 and Solar LLMs are more accurate for my test data than OpenAI’s models.

2024-07-01

How to think about creating a dataset for LLM finetuning evaluation

llms

finetuning

isafpr

afghanistan

datasets

evaluation

miniproject

I summarise the kinds of evaluations that are needed for a structured data generation task.

2024-06-25

One-click LLM finetuning with Predibase, OpenPipe and OpenAI

nlp

llms

miniproject

finetuning

isafpr

I tried out some services that promise to simplify the process of finetuning open models. I describe my experiences with Predibase, OpenPipe and OpenAI.

2024-06-17

Finetuning my first LLM(s) for structured data extraction with axolotl

nlp

afghanistan

llms

miniproject

finetuning

isafpr

I finetuned my first LLM(s) for the task of extracting structured data from ISAF press releases. Initial tests suggest that it worked pretty well out of the box.

2024-06-15

Evaluating the Baseline Performance of GPT-4-Turbo for Structured Data Extraction

nlp

afghanistan

datalabelling

llms

isafpr

miniproject

evaluation

I evaluated the baseline performance of OpenAI’s GPT-4-Turbo on the ISAF Press Release dataset.

2024-06-03

Structured Data Extraction for ISAF Press Releases with Instructor

nlp

afghanistan

datalabelling

isafpr

llms

miniproject

I used Instructor to understand how well LLMs are at extracting data from the ISAF Press Releases dataset. They did pretty well, but not across the board.

2024-06-02

Introducing the Afghanwire Dataset: A Unique Collection of Translated Afghan Media Articles from 2006-2009

miniproject

afghanistan

datalabelling

datasets

nlp

llms

isafpr

I’m publishing a unique new dataset of Afghan newspaper and magazine articles from the 2006-2009 period. This collection of over 7990 articles were originally translated from Dari and Pashto and published by Afghanwire, a media monitoring organisation that I co-founded and ran in Kabul at the time.

2024-04-01

Writing a custom Terraform provider to deploy Huggingface Spaces

devops

miniproject

terraform

skillbuilding

I worked on this short project to allow people to create/deploy Huggingface Spaces using Terraform (instead of via the API or using the website)

2024-03-31

Publishing the ISAF Press Releases dataset

miniproject

afghanistan

datalabelling

datasets

nlp

llms

I published a dataset from my previous work as a researcher in Afghanistan. It consists of press releases about military operations as well as full annotations showcasing information extracted from those press releases. It has value as a historical artifact but potentially could be used as an LLM evaluation task as well.

2024-03-24

Automating database backups with Tarsnap

databases

skillbuilding

softwareengineering

tools

miniproject

I added a cronjob to automate database backups for my MathsPrompt questions.

2023-07-24

Building MathsPrompt: a tool to help me review and practice problems for my degree

openai

llms

mathematics

rust

mu123

q31

skillbuilding

softwareengineering

tools

miniproject

I built a tool to help me practice the parts of mathematics that I find hardest. I also have been reading some books about Rust and I also wanted to play around with that so used it for the server / backend.

2023-07-23

Terraform Input Variables

terraform

devops

softwareengineering

All the ways you can set input and local variables when using Terraform.

2023-06-22

Tokenizer Links

nlp

balochi-language-model

tokenisation

links

Some links and random observations relating to tokenisation as gathered over the past week.

2023-06-04

Tokenizing Balochi with HuggingFace’s Tokenizer and FastAI/Spacy

nlp

balochi-language-model

tokenisation

balochi

I explore language tokenization using FastAI, Spacy, and Huggingface Tokenizers, with a special focus on the less-represented Balochi language. I share the challenges I faced due to language-specific limitations, my initiative to expand language metadata, and my plans to assess and enhance tokenization efficiency.

2023-06-03

The What, Why, and How of Tokenisation in Machine Learning

nlp

balochi-language-model

tokenisation

The basics around the tokenisation process: why we do it, the spectrum of choices when you get to choose how to do it, and the family of algorithms most commonly used at the moment.

2023-06-01

Building a Balochi Language Dataset for NLP Applications

balochi

nlp

balochi-language-model

ethics

datasets

I share my journey of building language models for Balochi, a language with few digital resources. I discuss assembling a dataset of 2.6 million Balochi words.

2023-05-29

The Risks of Language Models in Minority Languages

balochi

nlp

balochi-language-model

deep-learning

ethics

The dual-edged nature of developing a language model for the Balochi language, weighing potential benefits like improved communication, accessibility, and language preservation against serious risks such as misuse by state actors for surveillance and power consolidation, and the unintentional promotion of linguistic monoculture.

2023-05-22

Low-resource language models: making a start with Balochi

balochi

nlp

balochi-language-model

deep-learning

The Balochi language is underrepresented in NLP. I’m interested in contributing to the field by building a language model for Balochi from scratch and contributing training resources and datasets along the way.

2023-05-21

Finishing MU123

mathematics

mu123

q31

I completed the first module from my maths degree with the Open University. Highlights were quadratic equations, trigonometry and exponential functions.

2023-05-14

Exponents and Logarithms: a MU123 review

mathematics

mu123

q31

I delved into exponents and logarithms in my Open University Maths degree, discovering their practical applications and connections to concepts like Euler’s number. Gaining a deeper understanding, I enjoyed manipulating symbols and working with these fascinating mathematical tools.

2023-05-02

Terraform for the Uninitiated: Demystifying Your First Codebase

terraform

softwareengineering

devops

Learn the essentials of working with Terraform as a beginner, including basic commands like init, plan, apply, and destroy. Gain insights into code structure, variables, outputs, and providers while exploring a new codebase.

2023-04-29

How to remove a commit (or two) from your git branch

git

softwareengineering

versioncontrol

Instructions how to remove a commit from your git logs.

2023-04-28

The Trick Is The Thing, Part II

mathematics

mu123

q31

deeplearning

I’ve enjoyed learning about quadratic equations and trigonometry for my Maths degree, and am struck by how many incremental steps along the way contributed to the total edifice of understanding.

2023-03-25

Building Blocks For Better Stable Eights

computervision

fastai

parttwo

An impromptu continuation of the last blog, where I use perceptual loss to get the updates to my random noise image that I wanted and finally manage to ‘generate’ an image of the digit eight.

2023-03-18

Tricking my digits classifier with diffusion

computervision

fastai

parttwo

I accidentally built a way to adversarially generate handwritten images that seem to be of the number eight, but aren’t. This blog showcases an experiment I made around the core process going on in the generative diffusion process.

2023-03-05

On mathematical literacy

mathematics

mu123

q31

Thinking aloud about how to tie a collection of mathematical ‘tricks’ and operations together in some sort of logical and rounded whole.

2023-01-01

From the foundation up: Fashion-MNIST basics from Lesson 10

computervision

fastai

parttwo

Notes and some personal exploration following through the lesson 10 course materials from FastAI part 2. We cover the basics of loading in our data and generating our matrix.

2022-10-24

Deep learning tricks all the way down, with a bit of mathematics for good measure

computervision

fastai

parttwo

Notes and reflections based on the first lesson (aka ‘lesson 9’) of the FastAI Part II course. This covers the fundamentals of Stable Diffusion, how it works and some core concepts or techniques.

2022-10-17

Avoiding BIDMAS, or how J does notation

mathematics

mu123

q31

notation

I learned about prefix, postfix and infix notation, and how J evaluates mathematical expressions which makes the BIDMAS rules unnecessary.

2022-10-16

Storing Bytes: what data serialisation is and why you need it for machine learning

redactionmodel

computervision

mlops

python

tools

zenml

I explain the basics around data serialisation and deserialisation, why it’s a commonly-encountered topic, and showcase where I had to implement some custom logic to serialise custom Python objects used in a computer vision project.

2022-09-07

It takes a tribe: how I’m thinking about putting my object detection model into production

tools

redactionmodel

computervision

mlops

There are many pieces involved when deploying a model. This post covers the ones that relate to my object detection model and I explain how I’m going to put together the pipelines that will drive a continuous training loop once it’s all up.

2022-05-31

More Data, More Problems: Using DVC to handle data versioning for a computer vision problem

tools

redactionmodel

computervision

mlops

I show you why you probably want to be versioning your data alongside your code. I introduce the basic functionality of DVC, the industry-standard tool for data versioning. I also explain specifically how I’m using DVC for my computer vision project.

2022-05-24

Redaction Image Classifier: NLP Edition

fastai

nlp

partone

I train an NLP model to see how well it does at predicting whether an OCRed text contains a redaction or not. I run into a bunch of issues when training, leading me to conclude that training NLP models is more complicated than I’d at first suspected.

2022-05-21

A neural network for Fashion MNIST data

fastai

computervision

partone

The final step of this series looking at chapter 4 of the fastai book tackles the final step where we construct a very simple 3-layer neural network which learns to distinguish a pullover from a dress.

2022-05-15

Using the seven-step SGD process for Fashion MNIST

fastai

computervision

partone

I apply all the lessons we’ve learned so far on the Fashion MNIST dataset. This requires us learning a few new concepts like optimisers, ReLU, nonlinearity and so on.

2022-05-14

Stochastic Gradient Descent: a mini-example of the whole game

fastai

computervision

partone

This short post shows how you iterate through a simple example of optimising three values as passed into a quadratic equation/function. We use SGD to optimise these.

2022-05-13

Some foundations for machine learning with PyTorch

fastai

computervision

partone

I outline the basic process that a computer uses when training a model, greatly simplified and all explained through the lens of PyTorch and how it calculates gradients. These are some pre-requisite foundations that we will later apply to our Fashion MNIST dataset.

2022-05-12

A dress is not a pullover: learning about PyTorch Tensors and pixel similarity using the Fashion MNIST dataset

fastai

computervision

partone

I read part of chapter four of the fastai course book, learning about a naive approach to image classification (sort of!)

2022-05-11

A painless way to create an MVP demo using computer vision models

fastai

computervision

redactionmodel

tools

I created a few deployed MVP demos showcasing models I’d created while participating in the fastai course, uploading them to the Huggingface Hub and using a Gradio Demo hosted on Huggingface Spaces.

2022-05-07

How my pet cat taught me a lesson about validation data for image classification

fastai

computervision

partone

I learn a valuable lesson about how a model often will ‘cheat’ when training and sometimes the solution is a separate held-out set of ‘test’ data which can give a more accurate assessment of how well the model is performing.

2022-05-02

How to trust the data you feed your model: alternative data validation solutions in a computer vision context (part 3)

tools

redactionmodel

computervision

datavalidation

In this third and final post on data validation for the computer vision context, I cover some alternative tools that you might want to consider, from Evidently to the humble ‘assert’ statement. I conclude by setting out some guidelines for when you might want to be doing data validation and which tools might be more or less appropriate for your specific problem.

2022-04-28

How to trust the data you feed your model: data validation with Great Expectations in a computer vision context (part 2)

tools

redactionmodel

computervision

datavalidation

In this second post on data validation for the computer vision context, I show how you can use the automatic profiling feature of the Great Expectations library to get you started with increasing your confidence in your object detection annotations.

2022-04-26

How to trust the data you feed your model: data validation with Great Expectations in a computer vision context (part 1)

tools

redactionmodel

computervision

datavalidation

An overview of the problem that data validation seeks to solve, explored through the lens of an object detection problem and some of the tradeoffs that such an approach might bring. I introduce and simplify the high-level concepts you need to use the Great Expectations library.

2022-04-19

‘I guess this is what data-centric AI is!’: Performance boosts after training with synthetic data

tools

redactionmodel

computervision

I show how adding synthetic data has improved my redaction model’s performance. Once I trained with the synthetic images added, I realised a more targeted approach would do even better.

2022-04-06

Some characteristics of best-in-class ML portfolio projects

computervision

skillbuilding

I wrote about some of the things that go into creating a really great portfolio project for machine learning. For this post I’m less interested in the technical achievements than I am in how it is presented.

2022-04-04

Building my own image to use IceVision with Paperspace

tools

docker

computervision

I setup a new Paperspace project that uses a custom Docker image to provision its environment, saving me a bunch of initial installation time and dependency bug pain. A huge productivity win!

2022-03-25

Starting Docker In A Month Of Lunches

tools

dockerinamonthoflunches

books-i-read

I’m reading Elton Stoneman’s ‘Learn Docker in a Month of Lunches’ and blogging as I learn along the way. In chapters 1-3 we learn about the context for Docker as well as some basic commands for running and building containers.

2022-03-21

Figuring out why my object detection model is underperforming with FiftyOne, a great tool you probably haven’t heard of

redactionmodel

computervision

tools

debugging

jupyter

I used the under-appreciated tool FiftyOne to analyse the ways that my object detection model is underperforming. For computer vision problems, it’s really useful to have visual debugging aids and FiftyOne is a well-documented and solid tool to help with that.

2022-03-12

Incremental Improvements to my Redaction Detection Model

redactionmodel

computervision

tools

I used a series of techniques to improve the performance of my model while creating a pathway to (hopefully) bigger gains going forward.

2022-03-03

Three Python Helpers for Parsing Inputs

python

tools

The parse, yarl and datefinder packages are all ways in Python to help parse input data of different formats and types. Nothing essential here, but useful nonetheless.

2022-02-27

It’s raining bboxes: how I wrote a Python script to create 2097 synthetic images to help improve my machine learning model

redactionmodel

computervision

python

tools

I iterated through several prototypes to get to a script that could autogenerate synthetic training data for my computer vision model. I hoped to bootstrap my training to get a bit jump in model performance.

2022-02-10

What are invariants and how can they help make your Python classes more robust?

robustpython

python

books-i-read

Chapter 10 covers the last of the user-defined types explored in ‘Robust Python’: classes. We learn what an ‘invariant’ is and how to decide whether to use a data class or a class when rolling your own types.

2022-02-08

Upgrade your Python dicts with data classes

robustpython

python

books-i-read

Chapter 9 of ‘Robust Python’ dives into the uses of data classes, a user-defined datatype in which you can store heterogenous data together. They help formalise implicit concepts within your code and as a result also improve code readability.

2022-02-05

How and where to use enums in Python

robustpython

python

books-i-read

The eight chapter of Patrick Viafore’s book, ‘Robust Python’, gets into enums which you can use when you have a grouping of some constants that belong together.

2022-01-30

Using mypy for Python type checking

robustpython

python

books-i-read

Reflections on the sixth and seventh chapters of Patrick Viafore’s book, ‘Robust Python’. We slowly wind down our discussion of type hints in Python code and think through using mypy and how to introduce type hints to a legacy codebase.

2022-01-22

Using type annotation with collections in Python

robustpython

python

books-i-read

Reflections on the fifth chapter of Patrick Viafore’s book, ‘Robust Python’. We learn about how to use type annotations when collections (lists, dictionaries and sets, primarily) are involved.

2022-01-18

A Midway Report on my Computer Vision Project

python

fastai

tools

redactionmodel

A report midway through my computer vision project to detect the presence of redactions on government documents.

2022-01-16

Different ways to constrain types in Python

robustpython

python

books-i-read

Reflections on the fourth chapter of Patrick Viafore’s recent book, ‘Robust Python’. We learn about the different options for combining types and constraining exactly which sets of types are permitted for a particular function or variable signature.

2022-01-08

Learning about ‘nbdev’ while building a Python package for PDF machine learning datasets

python

jupyter

fastai

tools

Some early thoughts on the benefits and possible drawbacks of using fastai’s ‘nbdev’ literate programming tool which is a suite of tools that allows you to Python software packages from Jupyter notebooks.

2022-01-06

Getting practical with type annotations and `mypy`

robustpython

python

books-i-read

Reflections on the third chapter of Patrick Viafore’s recent book, ‘Robust Python’. We get some quick practical examples of how to use type annotation and how to use tools like mypy to analyse how typed values pass through your code.

2022-01-03

Counter: a shortcut to counting iterables in Python

python

A nice little helper from the Python standard library

2022-01-01

What’s special about types in Python?

robustpython

python

books-i-read

Reflections on the second chapter of Patrick Viafore’s recent book, ‘Robust Python’. We learn about types and how they fit into Python.

2021-12-30

Exploring J, an array programming language

What I have learned so far about why the J language exists and what problems it tries to solve.

2021-12-29

What makes code robust?

robustpython

python

books-i-read

Reflections on the first chapter of Patrick Viafore’s recent book, ‘Robust Python’.

2021-12-29

A Taxonomy of Redaction

redactionmodel

A brief analysis of some of the types of redactions that are commonly found in FOIA documents. I use these as the dataset used to train an object detection model for redactions.

2021-12-15