Yahoo’s Zuoyun Jin discusses his work as a machine learning research engineer, the most important programming languages and why almost everything comes back to data.
While the AI arms race has well and truly begun when it comes to advanced natural language models, AI technology has been embedded in many aspects of tech for a long time and this technology is trained using machine learning.
Zuoyun Jin works as a machine learning research engineer in Yahoo’s demand-side platform (DSP) research and development team.
This is a programmatic advertising platform that helps advertisers place advertisements on the internet ad inventory automatically through OpenRTB (real-time bidding).
‘AI/data analytics research has shifted toward developing more scalable and affordable solutions’
– ZUOYUN JIN
If there is such a thing, can you describe a typical day in the job?
Although it is not mandatory, I have chosen to work in the office two or three days a week since the office reopened post pandemic. I normally arrive at the company an hour before my work starts. I go to the gym to exercise for about half an hour and then enjoy breakfast while chatting with my colleagues.
Then I check emails, calendar, messages, etc, and make a to-do list for the day. I use G Suite to keep these lists as well as cheatsheets (all the knowledge I have learned and the information required for my work). I usually set two hours of focus time before and after lunch to start or continue existing tasks while there are no meetings.
Yahoo offers fitness training courses every Wednesday at the gym from 12pm to 1pm and 1pm to 2pm and I sometimes pick one of them before lunch.
My afternoons and evenings are busier than mornings because all my teammates are based in the US and it is hard to get uninterrupted time. So, I spend this time making reports, writing documents, evaluating the experiment result and discussing/planning tasks with my team which are all outside of my focus hours.
Most of my meetings are set after the late afternoon. Unlike software developers, I do not have official daily stand-up meetings, but whenever my team needs to discuss anything together, we always schedule a meeting time that works for everyone. Apart from that, I have scheduled weekly team meetings every Monday and Thursday.
Also, every Friday before I finish my work, I like to make a record of the completed tasks and tasks in progress to make things easier for me when the new week starts.
What types of machine learning projects do you work on?
Most of my work is in the bid shading domain. Bid shading is a phenomenon in the first-price auction, where bidders try to lower the price to avoid overpaying. For example, a bidder is trying to buy an item for €20 that they believe is worth €30 and is confident of winning the auction. At the same time this also increases the chance they lose the auction as others can offer more.
The example I mentioned above is a very simple scenario. On Yahoo DSP, we are doing bid shading on a massive scale of 1m to 5m requests per second fully automated. The ultimate goal of our projects is to find the best way to predict the probability of winning distribution (auction) given the bid price and request predicates, at the same time as optimising the process.
I particularly enjoyed working on a project where we designed two models to compute the optimal bid price for complex bid situations such as two types of auctions that happen sequentially given only partial information.
What skills do you use on a daily basis?
I think this varies depending on the project. Typical tasks in my job include data preparation, model design, model training, model evaluation, model optimisation/parameter tuning, productionise model (end-to-end), and monitoring (measuring model performance). Having a solid knowledge of machine learning and data science is a requirement for these tasks.
Being familiar with big data and Hadoop technology is also required. Every day, we receive a massive amount of bid requests (nearly 1m requests per second on average), and 4pc of them are logged into our Hadoop file system. I often need to read different tables and models for data mining tasks. For some projects this can be a daily routine.
As a research engineer, I am also responsible for building automated model training pipelines and deploying models to our server.
Python and Apache Pig are the languages we use for our offline model training and data preparation. I had never used Apache Pig before I joined Yahoo. However, with the advice and help of my mentor at that time, I learned the Pig script in a short period of time, and was able to quickly use it in the actual project while improving my scripting skill. Yahoo always encourages senior people to mentor new hires.
As in any tech role, communication and collaboration are always crucial. For example, many projects I have done require collaborations between various organisations.
During that time, I had meetings with people across different departments such as engineers, scientists and product managers to understand the context, business goal/risk, technical constraint and ethical risks to make sure everyone is on the same page.
What are the hardest parts of working in machine learning?
People often say that machine learning is all about the data, and I have to say that I agree to some extent. I believe it is as important as model training since they are interwoven. In the projects I have been involved in, data processing is the most basic and common step of the project.
Although this process is tedious and time-consuming, the quality of the data directly affects the effect of the AI model. I often encounter poor model performance in my work and eventually find it to be a data issue. In fact, a lot of raw data we collect has inconsistent format and errors.
When I start a project that uses features that have never been tested, the first thing I do is study the data, run sanity checks and do some experiments to make sure the data source is reliable, accessible, relatively clean, secure, well-governed and with no GDPR consent. This part sometimes can take nearly half of the project time.
How has this role changed as the sector has grown and evolved?
I have been working full-time for a year. Based on what I saw, the investment in AI/data analytics research has shifted toward developing more scalable and affordable solutions, such as unified solutions to reduce the cost of system resources and computation power. Our bid shading platform is transitioning to a ‘pay as you use’ subscription model.
Another big change is that we are about to move into a cookieless world. By the end of 2024, Google will phase out third-party cookies and advertisers will have to look for solutions to keep delivering ads to audiences that could potentially be of interest to them.
The contextual targeting approach is one of the possible solutions, in which advertisements would be selected by an algorithm and displayed to the user based on the actual website content.
This way, users will no longer be getting ads based on their personal data that had been collected, but rather those related to the visited website content.
What do you enjoy most about working in machine learning?
I am a member of the R&D team which is in the research organisation at Yahoo. Due to the nature of research projects, it sometimes takes a long time to find the optimal solution.
When a machine learning model I proposed after many experiments finally proves that it solves practical business problems, the sense of accomplishment is incomparable.
Also, Yahoo is committed to providing a fair and meaningful environment for every individual. Not only do I need a good working environment but also hope to get a sense of mission in the workplace.
Yahoo encourages us to actively participate in charity, community services, fitness, education and art. These experiences not only improved my overall health and team performance but also brought unique insights into the situation I encountered in my work.
What advice would you give to someone who wants to work in AI?
First, make a long-term learning plan and stick to it. Keep in mind that the field of AI is developing rapidly. Continuous learning of the latest knowledge in the AI field of your interests will give you a big plus in the AI area you want to work.
In addition, proficiency in a programming language such as Python is one of the must-learn languages for becoming an AI/analysis practitioner.
Finally, master the classic machine learning theory and basic algorithms. Essentially, all advanced machine learning model architectures and algorithms are extensions or built on top of classical ones.
10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.