GIL, Mojo, and a Soggy Christmas Tree

GIL

Recently I have been working on a project involving a huge (1/2 TB) database of fanfiction. I want to run some analyses on the data, but given its size I have to plan carefully about what code I run. My first thought was to utilize multithreading to help cut down on the time-to-run. I figured this would be easy enough: isolate the code I want to execute on each unit of the database, split the database into equal-sized portions according to thread count, and finally run each thread in parallel on their respective data.

Enter GIL. I had read threads and heard complaints about Python’s global interpreter lock, but it had never really affected my life so I never took time to really understand it. Well now it is affecting my life, and thanks to a wonderful explanation from Victor Skvortsov I somewhat understand why. Certainly pop into the linked post for some nice details on GIL. For me, the big takeaway is that GIL forces the Python interpreter to use one and only one thread, however threading can be used to allow Python to continue moving forward while certain things like IO calls or calls to C library functions which release GIL are active. The upshot is that if I want to even make use of threading in a substantial way, I will need to ensure that the bulk of my computations are executed by C code. Luckily, most numpy operations are included in this subset.

This revelation is going to cause quite an upset in how I thought I would be able to proceed with my project. I’ve been stalling for quite awhile to finish the first blog post about the project, so this setback has prompted me to run a basic analyses on a small portion of the database so that I have something to show off in the post. I’ll have to continue troubleshooting how I’ll approach analyzing this behemoth of a dataset. I believe the solution will have to extend much further than multithreading.

Mojo

One potential contributor to the solution is Mojo, an up-and-coming language superset of Python which aims to serve the needs of AI devs by extending the Python syntax to interface with scalable hardware. I’ve yet to jump in deep, but if I understand the project correctly I think there could be an opportunity to leverage Mojo for my analysis.

Soggy Christmas

skiing

The last few days I spent in Sandpoint with some good friends. Went skiing for the first time in my life! Unfortunately the snow was pretty bad, though I was still able to learn a lot. I had a bit of a trial-by-fire by taking a wrong turn down the hill. Hit a steep moguly and icy stretch and got my ass kicked by the mountain more times that I care to recount. At the end of it all I feel like I could handle some basic hills on my own next time I get out on a mountain, so that’s a W in my book.

The last day we were there, we put up a big christmas tree out on the docks.

skiing

The wind was a bit strong overnight, though.

skiing

Couple more random thoughts

  • LWN.net seems like a rad place to follow.

  • Stumbled upon SCOWL, a handy database for spell checking. I stumbled upon it looking for a list of words to use for a game of hangman I am coding to learn Rust.

  • I want to read more “Python Behind the Scenes” from the blog I linked to earlier. It addresses so many things I have been curious about for awhile.