Commenting my code

A long time ago I wrote about how I felt about comments in my source code. It was back in 2010, but I'm not going to link back to those posts because I wrote them in Spanish. If you are sufficiently motivated, feel free to search the archives.

The story is that, back in the day, I wasn't very fond of comments. I had this personal quest where I had to refactor my code until every single comment was unnecessary. I still think is a noble mission, but the years have made me realize how hard it is to accomplish.

Add a few years of C development to the mix—where things are as hard to read as ancient hieroglyphs—, and my code has evolved to a completely different stage: nowadays half of my writing is not for computers to compile, but for humans to read and understand.

Time and experience do wonders to people.

There's something that hasn't changed, however: I never write comments about the what, but only about the why.

The what my code is doing should be obvious for any other developers — at least, I try hard. Sure, there are cryptic lines here and there, but most of the time, the code should be enough to the reader. The why however, is a different story: more often than not I try to go out of my way to make sure my intentions are clear through comments in the code, and every time I go back and read what I wrote, the ideas I had become apparent and I feel I'm right there the same day I came up with them.

I'm pretty much addicted to this, so I really think this is the last stage of the evolution regarding comments. I'll get better at writing them, for sure, but I don't think I can go back to noble missions that are hard to accomplish.

Finished

I always wanted to go to a US college, so back in 2015, I started my Masters in Computer Science at Georgia Tech, and just this month of May 2019, after a lot of work and effort from my entire family, I graduated with a major in Machine Learning.

Incredible experience. A lot of sacrifice for sure, but in my opinion, absolutely worth it — not because of the doors that this could open in the future, but because of all of the knowledge that you get.

I'm happy that it's over, but I would do it all over again if I had to.

The question now is, what's next?

Never forget where you came from

Here is kind of a cheezy advice for a new manager, but one that I found quite useful as I got more leadership responsibilities a few years ago:

Never forget where you came from.

Some people get promoted and all of a sudden they think they are a big shot. They stop looking at their friends the same way, or hanging with those folks that "stayed behind." They change the way they talk, behave, and the circles they spend time in.

They fail to realize they are still the same person they were before the promotion. Their inability to act like the person who they are speaks about their stupidity and lack of humility. They forgot why they were promoted in the first place: for being who they were.

If you are promoted, go back and do more of all the shit that made you the person you are today. We need more of that, not less. Remember you aren't unique, and most important, without those people that supported you in the first place you aren't anyone.

Don't turn your back to the rest of us.

Mixed feelings

After four weeks of working full time with AWS, I have mixed feelings about the current state of cloud development. This is a great time to develop enterprise applications in the cloud: AWS (and Azure, and GCP) makes so many things possible — things that were complex before. You can quickly stitch together services and create a sophisticated, integrated system with much less effort than before.

This is great.

But I also feel that there's so much more than can and needs to be done. Over the past month, I estimate that we have spent 90% of our time worrying about infrastructure and only 10% working on the actual code of the application. Sure, at the end this might be for the greater good, but a lot of things have become too cumbersome and time-consuming. Things that are supposed to be much simpler.

Developers can't just be developers anymore; they have to learn a ton of stuff that wasn't part of their jobs before. Or maybe, this is just what being a modern "developer" means.

I'm looking forward to a time where coding your application becomes the main focus again, and the infrastructure piece "just works." We've made a lot of progress, but we haven't arrived yet.

Importing related files from an Amazon S3 bucket using an AWS Lambda function

There's an Amazon S3 bucket that we need to monitor to process files copied into it. Doing this is pretty straightforward by invoking a lambda function whenever the s3:ObjectCreated:* event is triggered by S3.

The Internet is plastered with examples on how to set up this process. You should be able to get everything up and running after a few minutes following one of those tutorials.

But here is a different twist to this problem: we are going to be receiving files in pairs that need to be processed together. Specifically, we will be receiving an image file (let's assume a .PNG file for simplicity) along with related information (metadata) stored in a .JSON file with the same name. For example, we will be getting a file named image1.png and image1.json copied into the bucket, and we need to make sure they are processed together.

Let's define our problem a little bit more formally:

  • Files will come in pairs, a .PNG file and a .JSON file with the same name.

  • Files might be copied at different times into the S3 bucket. We might get the images first, then the corresponding metadata files at a later date, or vice versa.

  • We need to process the files as a unit. We can't handle the image nor the metadata until we have access to both files.

AWS triggers an event for each object created in the S3 bucket, so for each pair of files, we will be getting two separate lambda invocations. We can't make assumptions about the order of the invocations, or how long until both files are ready, so we need to build some synchronization plumbing to take care of this.

The idea behind the solution

I'm sure there are multiple ways to tackle this problem, but we wanted to make things as simple as possible, so we decided to have our lambda collect the files as they show up, and only trigger the processing step whenever we have the pair together.

We can't keep the files in the original S3 bucket because we aren't sure how long it will take for the pair to be ready, and the original files might be removed before we have a chance to collect both of them. This means that we need to copy the files to a separate S3 bucket as soon as we get access to them, and hold them there until we get access to the second file of the pair and have a chance to process them.

To keep track of which file we have and which one we are missing, we can use a DynamoDB table to keep track of where we are. Whenever the lambda function is invoked, we can check the DynamoDB table to determine whether we have both files of the pair, and only move to the processing step when we do.

The radiography of the lambda function

Here is a high-level description of what our lambda function looks like. Remember this function is invoked with every s3:ObjectCreated:* event triggered by the source S3 bucket:

  1. Copy the file from source S3 bucket into the target S3 bucket — this target bucket is our temporal space until we are ready to process the pair of files.

  2. Get from DynamoDB table the record corresponding to the file — we can do this using the name of the file as its identifier.

  3. If a record doesn't exist, create a new one with a status of loading — if a record doesn't exist, it means that this is the first file corresponding to the pair, so we just need to create a new record and do nothing else.

  4. If a record does exist, we can update its status to ready and invoke the processing step.

The code that makes things happen

The gist above shows the Python implementation for this lambda function. Notice that the code makes the following assumptions:

  • Files will be copied in a source S3 bucket that's connected to this lambda function. You can do this by following any of the available samples published as part of AWS' documentation.

  • The lambda function will copy the files to a bucket named temp-bucket. Make sure you change this reference in the code to the name of your own bucket.

  • There's a DynamoDB table created with the name dynamodb-table.

  • There's a (mysterious) invoke_processing_step function that I'm leaving outside of the code. This function receives the name of the file and takes care of processing the pair.

It could be a little bit more complicated

There might be more than two files that we need to process together. In that case, the DynamoDB table will have to store a little bit more information: which files have been read and which ones are missing. Extending the code to support this scenario shouldn't be much more complicated, so I'm leaving that to the reader.

You might also want to remove the records from the DynamoDB table as soon as you finish processing them. I'm assuming this is outside of the scope of this lambda function —or at least, it's not part of the main thread that I tried to follow with this post—, but keep that in mind.

It came out pretty good

In the end, the code came out pretty clean, and the process seems to be holding up pretty well. I'm curious about the results when we stress-test it by bombarding all sort of files into the bucket, but I'm confident things will go as expected.

I'd love to hear about other ways to solve this same problem or any considerations that we might have missed when designing this solution. Don't hesitate to reach out if you have any comments.

Looking past the surface

This one took me a long time to learn.

People applying for an open position at your company are much more than what their resumes say. They are more than their education and experience. They are more than their ability to answer questions under pressure or make a first good impression.

They are people. Unpredictable human beings full of surprises.

Sometimes one has to look past the surface and evaluate candidates based on the impact they can make on your team. Instead of only focusing on them as individuals, think about the whole picture. What could improve if this person starts working with us? How is this person going to change things around here?

A lot of good things can happen when people are giving the opportunity to work under the right conditions around equally motivated individuals. These things are hard to quantify and they usually don't come up in 60-minute interviews, let alone in 500-word resumes.

OpenAI Gym's LunarLander-v2 Implementation

If you are into Reinforcement Learning, it's very likely that you've heard about OpenAI Gym. It's an amazing platform that you should check out in case you haven't heard about it.

This post is specifically about the LunarLander-v2 environment and my implementation to solve it. This environment consists of a lander that, by learning how to control 4 different actions, has to land safely on a landing pad with both legs touching the ground.

This was my first exciting Reinforcement Learning problem and I'm very proud of the work I did and everything I learned in the process. Here are some cool things I got to use while working on this project:

  • Deep Q-Network (DQN)
  • Deep Neural Networks (DNN)
  • Experience Replay

Most than anything, this project got me to love Reinforcement Learning and helped me understand the power of it.

Here is the link to the GitHub repo with my solution that also includes the full analysis.

Focus your resume in the value you can provide, not the tools you can use

We all do the same thing: plaster our resumes with every single detail we know or ever heard mentioned. From the most critical technology stack to the smallest recondite tool, we always focus too much on the tools we can handle and forget something much more important.

Our resumes should be more about the value we can provide and much less about the things we will use to achieve that value.

Often, people making hiring decisions do not understand how to solve their problems in the first place, so it's tough for them to make the connection between your list of skills and their needs. I've been on the hiring side enough times to understand this.

Companies have problems, and they are looking for candidates that can solve them. During my interviews, I spend most of my time trying to find a good fit for the candidate in front of me — I'm making the necessary connections in my mind, doing the translation from the list of skills to the value that those skills can afford me. This works when I know what I'm talking about, but it's tough when I'm trying to hire people that know how to do things that are entirely above my head.

You should consider selling yourself better.

Stop focusing on your skills; those are the tools that you can use to provide value, and they are great, and fancy, and buzzwordy, but knowing them is a feeble measure of whether you'll be able to solve any problems. I bet you've never hired a contractor because they told you they had a chainsaw and a hammer, right? You hired them because they showed you they knew how to solve your problem. You can apply the same logic here.

This is how you can do better:

  1. Look at your resume and make sure you are not focusing too much on the tools. If you have more than four or five skills listed there, you are probably giving too many unnecessary details. Try to shrink down the list by removing everything that's not relevant. As an example, I've read too many resumes for a Senior Engineering position listing things like "XML," "JSON," and "Visual Studio .NET."

  2. After the first pass, you can try and consolidate the list even further by abstracting your specific skills to what matters to the client. For example, instead of listing "HTML," "CSS," and "JavaScript," you can say "Web development," or instead of listing "Objective C" and "Swift," you can say "Mobile development."

  3. Your resume should be tailored as much as possible to the position you are seeking, and your experience should reflect what you've been able to accomplish working under similar responsibilities. How were you able to provide value as a "Mobile Developer" in your last company? Did you do something out of the ordinary? Were you able to reduce timelines, or cut budgets, or come up with a novel solution to a problem?

  4. Finally, whenever you can, make your application as much about the company as possible. This might be hard if you are applying to small, lesser-known firms, but sometimes you can find a lot of information online. Identify different areas in the company that you can improve, and use your pitch about how to solve those problems as the introduction to your resume — A cover letter is a perfect way to do this. Tell people how, thanks to your experience, you can provide incredible value to their operation.

Even if you do a little bit of the above, you'll stand above the legion of people that stubbornly keep sending resumes meant for search engines and not hiring managers.

A little bit of salesmanship is something that we can all use to our benefit.

What motivates me?

Yesterday Nelson asked me what exactly motivates me to do what I do every day. A loaded question that made me think for a bit.

I gave him what in retrospect seems like a 75%-complete answer. I also realize that I've never asked myself this same question, or at least, I've never really thought about what the actual answer is.

It seems like a great exercise, and I plan to do a couple of things from now on:

  1. Ask myself this question on a regular basis. I want to make sure I'm spending my time appropriately, and my motives make me feel proud of who I am and what I do.

  2. Ask the people around me this question and learn from what makes them wake up every day.

And with a little bit more time to think about it, here is what I think it is a complete answer to what motivates me:

First, I'm doing what I love: I'm helping push the industry forward by making computers do what seems magical for a lot of people that use them. Here "magical" doesn't necessarily mean extraordinary, but instead delightful and extremely useful. I enjoy telling people how my work affects their life. It makes me proud, and all those hours, and work, and sweat seem worth every penny.

Second, my contributions are squarely focused on what I enjoy the most: solving hard problems. Things that you can't simply find by searching the web or reading a book, things that stretch me to enormous lengths, things that make me extraordinarily uncomfortable. Overcoming these challenges is a powerful motivator.

Finally, I get to provide for my family by doing what I love. They say happiness is finding what you love and doing it every day. If on top of that, you also find somebody that's willing to pay you generously for this, even better.

I think this is it. I'll review this back from time to time.

By the way, what motivates you?

Yet another article about your Machine Learning career

This is what I wish I read before starting.

This article is supposed to be different from the one I posted a few days ago: A quick guide to get started on Machine Learning and Computer Vision. That one is more of a collection of resources that focus mostly on getting up to speed on Machine Learning and Computer Vision, but it lacks the story part. You know, when you don't know where to start, sometimes you need somebody that guides you step by step from the beginning and doesn't throw you in the middle of a shitton of resources.

That article was about throwing you in the dumpster. This one tries to be helpful for those who want to start doing this for real.

This "guide" (or however you want to call it) is more or less what worked for me (and exactly how I'm still doing it, because this is a lifelong learning experience, buddy).

First, let's get the bad news out of the way

If you want to dedicate your life to Machine Learning, there will be math involved. Calculus, linear algebra, statistics, and probabilities. Do you have to be an expert? Of course not, but you'll have to make peace with the idea of moonlighting while reading math concepts that you forgot ten years ago. If you like math, then this won't be a problem, and if you don't, well, it is not the end of the world, but it will have to be part of your life.

A lot of people recommend starting with a math refresher before anything else, but that would never work for me. There's a lot of math out there, and I wanted to make sure I wasn't overcomplicating my life with things that were not relevant. I began with Machine Learning theory and only looked at the specific math concepts as they got in front of me; so that's what I'd recommend you do as well.

Here you have a list of free online resources from the MIT Open Courseware that cover everything you need (and if you haven't seen the MIT Open Courseware site, consider this your End-Of-2018 gift): Mathematics for Computer Science Linear Algebra Introduction to Probability and Statistics Single Variable Calculus * Multivariable Calculus

If you are like me, then you'd like to grab a book or two to put on a shelf and never read. But you know, at least you have it just in case you are bored some day. Here are the books that I'd recommend (some of these I own, some are recommendations from other people, and most of them I haven't read):

You need the theory

Alright, so putting the math aside, like with anything else in life, you'll need some Machine Learning theory to know what you are doing. Of course, you can start messing with things directly and skip the lectures, but this would be like trying to fly a plane without going to pilot school first (except the crashing part).

A lot of people will try to send you to graduate school, and although this is a way, it's not necessary. Today, there's a lot of online coursework that you can take. For a couple of examples, check Udacity and Coursera. For a little bit more specific advice, check A quick guide to get started on Machine Learning and Computer Vision; I added some specifics there.

I had the blessing to go to graduate school. I wasn't thinking about doing Machine Learning, but once inside, I decided to specialize in it. This works, of course, but I know smarter people that didn't pay a ton of money or spent a ton of time slaving in college. Yesterday, companies cared a lot about your credentials, but right now, they are taking anyone with the necessary skills: companies need people that care, like the field, and know how to get stuff done. And you can do all of this without going to school.

So start taking courses and reading papers. Free or paid, doesn't matter. Just start learning. You can also read books, of course, but the papers will keep you on top of what's new, and hot, and buzzwordy. Papers will be harder to read, but a great exercise to get smarter (yes, you'll get smarter) and get a headache from time to time.

Pick a language

Controversial topic, for sure, but I honestly don't care: I picked Python because, by the time Machine Learning became a thing for me, I already had experience with Python. And also, because I believe Python is the best.

There, I said it.

But you can also do R and be perfectly fine with it. I know somebody (yes, a single person) that does R. But he also does Python whenever TensorFlow is involved, so, as I said, do Python.

Here are a few more books that you can add to your collection and never read: Introduction to Machine Learning with Python Python Machine Learning By Example * Machine Learning with Python Cookbook

And if you are a software developer, expect a whole lot fewer lines of code to get cool things to happen here. You are probably used to count your lines of code by the thousand —or heck, even by the million—, but here it will be different. The focus is on the data and not in the code, and you'll learn that the hard way like everyone else. Just a heads up that here you'll find average programmers achieving amazing things. Emulate them.

Then, make things happen

This is the fun part, and unfortunately, the part that some people forget: start doing things. Whenever you apply to a job, this is what companies want to see from you: shit that works. Who cares how many papers you've read and how many fancy words you can spell? Show me things that run and work, and you are getting to the next level!

Your coursework will give you a ton of exercises, so make sure you create your GitHub account and upload all your solutions there (you can set them private if the school doesn't like you to share them). Make sure you write a report or some explanation together with the code. If you have nothing else to show, this will be your Plan B.

But if you have time, and you don't have kids that ruin your nights, you can download a dataset from anywhere and do something with it. I don't care what it is, but make it interesting. Just play with the data, and select some algorithms to do something with it. Document your findings, your struggles, and push it to your GitHub account. Repeat three times, and you'll get more from it than from reading all the theory books ever written.

If you are lucky to participate in some real-world project as part of your school or company, then great! You are all set, and headed down the right path. If you are not, another option is to participate in an open-source project (everyone says this, so I'm repeating it, but open-source projects are more intimidating than what people tend to make others believe). I'd recommend asking around (StackOverflow or Quora may help) and finding a good place that allows you to follow along —and who knows— you may be able to contribute here and there.

Doing is the way you'll learn the most, so make it happen.

Expect this to be your job

First, you'll spend a lot of your time dealing with data: getting it, cleaning it, making sense out of it, cleaning it some more, and playing with it in every imaginable way.

The data will be your everything.

Then, you'll pick an algorithm —or two—, and you'll run your data through it. Sounds easier than it is, but this is pretty much a good summary. It is unlikely that you'll be inventing new algorithms, but instead, you'll be using everything that others have already put together for you. And don't let people minimize this step: as I said, this is tricky stuff. Knowing what to pick and how to apply it is an art that you'll need to master.

Then, you'll communicate your results. This is very important: unless you can get people to understand what you did, your work has no value. Here you'll learn how to talk, write, and visualize your data. You'll need to master the art of making people that know nothing about math, statistics, and probabilities understand what you need them to know.

Finally, you'll look at your work, decide that is not that good after all, and go back to the first step to iterate and make it better. Rinse and repeat, my friend. You'll do this until it's good enough —or you run out of budget, whatever comes first.

Final remarks

Be ready to suck at this for a long time. I've been doing it for some time, and I'm still horrible at it. I have hope, and I'm certainly better than what I was last week, but this is a forever thing, and only practice and time will help you get there.

And of course, don't let this to discourage you. Nobody can expect to show up in training camp and beat those who have been playing for five straight years. But if you get your sorry ass out of the comfort zone and start dedicating some time to this, you'll make it just like everyone else.

Have a Merry Christmas and Happy New 2019!