PyData Yerevan 2022

Best practices for coding in ML/DS - Techniques to construct your project
08-12, 12:00–12:40 (Asia/Yerevan), 213W PAB

Many engineers, particularly those in Data Science, do not focus on writing better code, which
their coworkers will love.

This is bad!

Writing cleaner code, and using appropriate tools for experiment logging reduces the time of
debugging and the effort spent on the project in the long term. Consequently, the code becomes
readable and onboarding new engineers on the project becomes easier.

The target audience is beginner ML/DS practitioners who struggle to write cleaner code, data
scientists who consider adapting better techniques and tools in the project, and students who
make their first steps into the world of ML.

The only necessary knowledge is the ability to read and understand Python code, as all my
examples contain the latter. No prior technical knowledge is necessary as the talk is fully
introductory on a high level.

By the end of the lecture, attendees will have learned about the importance of having a clean
code in the ML project. They will have developed intuition about wiring readable and
understandable code and will have acquired knowledge about the general design of a good
codebase, and some tools that will help engineers log experiments for a cleaner environment.


Having a clean codebase and tooling is necessary to iterate over a machine learning project,
which will easily adapt to new business needs, integrate new ideas, experimenting with new ML
models, and so on.

Many engineers underestimate the power of clean code and clean architecture, which in the long
run saves time and money.

During this talk, engineers will learn about the necessity and techniques of writing better variable
names. They will learn where, how, and when to put comments as well as the importance of
docstrings and tools which work with them. General traits of good code will be introduced as
DbC, DRY, etc, and some intuition of SOLID. Some experiment logging techniques and tools
will also be covered.

All technical concepts will be discussed on a high level, so no prior deep knowledge of OOP,
software design, and ML algorithms and tools is necessary. The talk is introductory and refers to
students as well.


Prior Knowledge Expected

No previous knowledge expected

Education
2017-2021 - YSU, Faculty of Physics, theoretical physics.
My diploma work was "Classification of Blazars with Machine Learning Techniuqes"
2021-present - YSU, Faculty of Mathematics and Mechanics, applied statistics and data science

Experience
2018 - present - ICRANet-Armenia, research assistant
I was engaged in manual astrophysical data analysis and then co-wrote software or automation, various research of high energy astrophysics, and Machine Learning on those data
2020-2021 - EasyDMARC, Machine Learning Specialist
With the mentorship of a senior manager, I have to build an end-to-end machine learning solution for anomaly detection for an email cyber security platform.
2021-2022 - Krisp, Machine Learning Engineer, Computer Vision.
I was engaged in human segmentation task, to replace the background in virtual meetings. Collaborating with QA, PM, and other staff research team I contributed to the development of already created technology.