Analysis of My Productivity Data

Overview

On 6 December 2018, I began using a pomodoro-technique productivity timer called Be Focused. In addition to counting down intervals, the app allows the user to create tasks and keeps track of how many intervals one completes per task. I had two goals in using this app. First, I wanted limit how much time I spent unproductively online. I thought that rewarding myself for being productive would be more effective than punishing myself for wasting time. Second, I wanted to nudge myself into doing a variety of activities every day. Beyond looking at the app's weekly reports to see how often I was hitting my initial goal of ten intervals per day, I never analyzed the data until now. As John Steinbeck writes on the opening page of Travels with Charley, "I set this matter down not to instruct others but to inform myself."

From 06 Dec 2018 through 22 July 2022 (inclusive) the app generated a data file consisting of 9,236 entries, each having four observations. The data has some significant limitations. First, because the app is on my desktop only, I could not record my productivity while on vacation or while working abroad. Second, and relatedly, at one point, an OS update prevented the app from saving information until the code was fixed over a week later. All told, 298 of the 1325 days in the time period under consideration lack data. Third, the tasks are not mutually exclusive. For example, writing work-related emails could fit under two different tasks, and I was not consistent in my classification. Fourth, I did not record when tasks were created or deleted. For example, the task of Google Certificate appears only for a few months over the summer of 2022.

Questions Addressed

  1.  Has my productivity increased?
  2.  Is Sunday my least productive day?
  3.  Were my most productive days inflated by teaching and meetings?
  4.  Was I completing a variety of kinds of tasks?
  5.  Did the weather play a role in my productivity?

Has my productivity increased?

Because the size of the dataset might be unmanagable for Google Sheets, I wrote a script in R to eliminate the duration column (because it never changed) and the status column (because I did not use that feature of the app). I further simplified the data by summing each day's completed intervals. This reduced the data set to 1,027 entries with two observations per entry. With the data set now a managable size for Sheets, I made the following pivot table and chart:

Year20182019202020212022
Sum of intervals16645243635342605

Since the 2022 data includes only a little more than half of the year (203 days), it seems safe to predict that 2022 will be even more productive than 2021 and will continue the upward trend.

Is Sunday my least productive day?

I suspected that Sunday was the least productive day, but I wasn't sure what the other days were like. In R, I added the day of the week information and in Sheets made this bar chart:

I was a little surprised that Monday and Thursday were the most productive days. One possibility is that I usually fast on those days and so the time normally spent cooking and eating (neither of which I keep track of) could have been be used to complete tasks that I do track. Similarly, Sundays I tend to cook a more involved breakfast. I also talk with my parents. Again, neither of those activities are tracked. I wondered, though, whether the increase on Monday and Thursday was due to spending more time on my classes as those were the days when I had assignments due. I also tended to favor Thursdays. Similarly, Sunday was typically the day when the fewest students emailed me or submitted course work. This lead me to the next question.

Were my most productive days inflated by teaching and meetings?

Since the average number of intervals completed per day was nine, I decided to look at days when I completed ten or more intervals, and days when I completed more than fifteen intervals. Using Python (code on Github), I determined that the average contribution of work-related intervals to above-average days (intervals completed > 9) is 25.05%. If you raise the bar a bit, the average contribution of work-related intervals to highly productive days (intervals completed > 15) drops to 19.07%. Lastly, the average number of intervals completed when work was all I did is 3.6. These calculations show that my fear that work was the main driver of my productivity was unfounded.

Was I completing a variety of kinds of tasks?

The previous question is a specific version of the broader question of whether I was meeting my goal of persuing multiple tasks everyday and not focusing on, say, learning Python. Because looking at each day individually would not be particularly helpful (I say this having tried), I decided to use the monthly average for the number of different tasks completed. I used Python to clean and filter the data and then to make this chart:

While the trend is in the right direction, it's not consistent. December and May tend to be months with lower task diversity, which perhaps might be explained by those being months in which semesters end and so I had more work than usual to complete at the expense of other sorts of tasks. (The gap on the left side of the chart is the semester when I taught in Luxembourg and could not record my productivity).

Did the weather play a role in my productivity?

In part because of my work on Friedrich Nietzsche, I was curious to learn whether climatic conditions had an impact on my productivity. In one of his last books, Nietzsche writes: "List the places where men with esprit are living or have lived, where wit, subtlety, and malice belonged to happiness, where genius found its home almost of necessity: all of them have excellent dry air. Paris, Provence, Florence, Jerusalem, Athens—these names prove something: genius depends on dry air, on clear skies" (Ecce Homo, Why I Am So Clever §2). Interestingly, these are not places where Nietzsche had lived, so he is not claiming here that he himself is a genius. And of course I'm not claiming that I am either. Nonetheless I was curious to see whether he was correct, at least in my case. Humidity as a climatic condition is less important today than it was in the 19th century. Central heating and cooling systems mean that most of us live most of the year in roughly the same climate, at least to the extent that we are indoors. Fortunately, I had other weather-related questions that I was curious to pursue.

I began by getting a quick sense of how my productivity was affected by the weather in a very broad way. I used Sheets to calculate the semester for each of the data points, and then created a chart of the overall productivity by each semester:

Officially, the academic year comprises the following: 15-Week Fall and Spring Semesters, a 4-Week Winter Term, and a 12-Week Summer Term. This leaves six weeks unaccounted. I expanded the winter and summer terms to include those weeks. Nonetheless, it is unsurprising that even the expanded the winter term had the lowest productivity. Not only is it by far the shortest of the four terms, but a large portion includes the winter holidays when I did not typically track productivity. I was surprised by the difference between the spring and the fall. I am assuming that the two anomalies affecting the 2018-2019 academic year roughly cancel each other out. In particular, I started the data collection during the very end of the fall semester of 2018 (specifically, I had data for only the last nine days of the 105 days in the semester), because I was abroad for but two days of the 2019 spring semester.

To hone in on one weather-related question, I obtained a dataset from the National Oceanic and Atmospheric Administration (NOAA) for where I was living during the timeframe under consideration. The dataset was significantly larger than my productivity data. With a file size approaching 20MB, I could not upload it to Sheets. Using Python (code on Github), I determined that the set included 44,271 entries each with 124 observations. I wrote a script to remove all of the entirely empty columns, plus some that either never changed (like the reporting station) or ones that were not relevant to answering my questions (like DailyPeakWindDirection). This reduced the number of columns to 34, and cut the file size almost in half.

I decided to explore whether there was a relationship between length of day and productivity. I chose length of day because it is one meteorological condition that affects most people, even if they spend the day inside. Determining the correlation between productivity and length of day required first extracting the sunrise and sunset times from the last observation of each day, converting those from decimals into datetime format, calculing the difference between the two, and then running a correlation calculation. The correlation (or Pearson's correlation coefficient) is 0.101. We can see the lack of correlation in this binhex plot:

This kind of visualization, as opposed to a scatterplot, helpfully clusters the data and we can see that, for example, the number of times that I completed ten intervals in a day is relatively constant across the changing day length. Interestingly, the extremes of the day length show that the shortest days can be slightly more productive than the longest ones. Again, I was surprised by this as I thought I was greatly (and negatively) influenced by the short days of winter.

Conclusion

I had no illusions that I would be able to extract actionable items from this analysis. Productivity is a complex topic with a long history and even more advice. Already in Plato's day there was the proverb that "the beginning is half of every work" (Laws 753e), and the amount of advice has only proliferated since then. Nonetheless, the insight that my assumptions about my productivity were frequently incorrect suggested that I needed better access to the data. Although the Be Focused app provides a reporting function with several options, none of them suited my needs. I created a Python script (code on Github) that will analyze each month's data a produce a dashboard like this:

Thinking that it would be more powerful to tell me which intervals I did not complete even once during the month rather than having them potentially get lost in a bar chart, I had the script output an admonishment. In the case illustrated above, it scolded me: You did not do even a single interval of the following: writing.

My hope is that this feedback will motivate me to pursue even better my twin goals of not only completing more intervals per day but completing a variety of goals each day.

Earlier I quoted Steinbeck on the personal motivation for this project. Nonetheless, in case other users of productivity apps want to run my Python script, it is available here.

Executive Summary

The analysis of my productivity data demonstrated that while some of my assumptions were correct (e.g., that Sundays were my least productive day), more were incorrect (e.g., that work-related tasks contributed the most to highly productive days). This lead me to create a monthly dashboard that compactly visualizes the number of intervals I completed in the previous month for the tasks that I value the most, and compares the total number of intervals completed in the past month with the totals for the previous twelve months. With this information, I will be better able to pursue my most valued tasks as well as to maintain motivation to remain productive.