Building software is hard. Building correct software - the one that does what it is supposed to do - is even harder. There are many well-known tools to achieve correctness after writing the code: manual QA, linters, tests, code reviews, etc. However, ensuring correctness during or even before writing the code - or making it very hard to produce incorrect code - is also possible. There isn’t a single tool, technique, or framework that magically makes your software correct. Achieving it requires some discipline and adherence to a design principle: “design for correctness.”
So, you now have a mysterious Tech Lead role. It might have happened due to a promotion (congrats!), team reorganization, changing jobs, or in a few dozens of other ways. No matter which path took you here, things will never be the same. The most substantial change is that you’re not a 100% individual contributor anymore - your scope of responsibility is broader than one person can handle alone. To be successful, you’ll need to undertake a psychological change - let go of controlling (or even doing yourself) many things and trust the team you have to get them done. On the other hand, there are some subtle, easy-to-overlook aspects of work you should influence and shape to make the team effective and efficient. I’m offering my views on what a Tech Lead should and should not do in this post.
The launch we covered in the previous post was a major milestone, but not the final destination. In fact, it was a solid foundation for the many improvements made later - from small bugfixes and tuning, to supporting new business initiatives. The design choices we made - eventsourcing, statefulness, distributed system, etc. - affected all of those changes; most often making hard things easy, but sometimes making easy things complex.
The previous post took us through the implementation phase - the next step was to launch the product. The stakes were high - our new system managed a critically important business process (described in the first post), so we needed to make sure everything runs well. To better understand how the system would behave under production traffic, we have put it through a series of load tests of increasing complexity and load. It allowed us to capture a few issues that, if manifested in production, could have caused significant downtime and losses.
In the first part we’ve taken a look at how Akka features help us achieve Persistence, Consistency and Availability goals. In this part, we’ll continue exploring the implementation and focus on how Akka helped in handling the requests and achieving required performance levels.