Transfer Learning – Makes the Machine Learning Models Works Even with Insufficient* Labelled Data

1. Introduction

Let us start the story with a data science project that predicts users credit scoring using the telco data at country A. It is a successful machine learning (ML) project as we have sufficient large and comprehensive labelled data.

Then, the business team expects the data science team to duplicate the same model for a new market at country quickly. However, we can’t as the currency and consumers behaviour of country B are different from country A.

I believe the story is common in data science companies. Generally, the ML model is built on the assumption that the training and test data are extracted from same feature space and same distribution. In other words, once the distribution shifts, the model fails.

Once the distribution shifts, the model fails.

Researchers have a long thought of this problem with the solution called transfer learning. In a layman term, we have labelled data from the source domain and we would like to build a ML model for the target domain of different tasks or distribution than the source domain (Pan and Yang, 2010).

In this article, we will experiment on a transfer learning method that proposed by Hal Daume III (2006), named easy adaptation (this name is coined in his later paper). In the followings, we will briefly explain easy adaptation in Section 2 and the experiment in Section 3. Finally, the conclusion is drawn in Section 4.

2. Transfer Learning with Easy Adaptation

Easy adaptation has a simple construction method in Daumé III paper of title “Frustratingly Easy Domain Adaptation”. Say that we have labelled data of similar feature space (attributes of x0 and x1 with output y) in both source and target domains but in different distribution (refer to Figure 1). For instance, the second record from source domain data is (x0=2, x1=20, y=2) but the output becomes y=1 in the first record of target domain data.

Figure 1: Easy adaptation on purported tables from source and target domains.

Continue reading “Transfer Learning – Makes the Machine Learning Models Works Even with Insufficient* Labelled Data”

Advertisements

How powerful is data?

How powerful is data?

The following map presents a very small samples of the distribution of students’ accommodation of one of the universities in Malaysia based on ada mobile ad exchange data. The students and their corresponding approximate residence spots are identified by geofencing the day and night time location updates.

Some highlights:

  • Obviously most students stay near to campus. Some prefer driving and that explain why the car park of that university always full house.
  • The full detail information of the map (not for view here) can be used to plan the university shuttle bus route and to identify the students’ favourite hangout spots.
  • We do not unlock the identities of the mobile phone owners, a.k.a. the Pandora box. We know where they are but not who they are.

The work is credited to ada Data Science and Engineering team.

Original post at https://www.linkedin.com/feed/update/urn:li:activity:6412478258041970688

 

The Things about Job Title

[My original post from linkedin https://www.linkedin.com/pulse/things-job-title-zan-kai-chong/%5D

Switching from the job title data scientist to machine learning engineer amuses a lot of my friends. They wondering since when I become a lecturer again (note : machine “learning”). Despite my wrongful explanation, I start thinking what is my real job title other than those words printed on my name cards.

Analytically, I should list down all my job functions. Then build the heat map or histogram from all the words in the description and then identify the common words by applying the max(count) or corr function. Okay, sounds right. Here we go.

First, I work on AWS platform. As a trustworthy power user, I stress-test the costly computing instance and provide my helpful IT support to new comers (lady is preferable). I also use EMR (very expensive computer clusters) like-a-boss occasionally for big data stuff. Occasionally, I speak AWS jargon as if I am real AWS engineer.

Okay. You saw the word “big data”. Of cause , I am (acting like) a big data engineer as I work on peta-ful (new word to describe peta bytes) data. These petaful data are our asset to track you. We may not be as good as Cambridge Analytica. But we know many things about you and what you did last weekend. The more you attach to your phone, the more we know you.

In short , it is fun to work in analytics company. Well, my real job title? I am wondering as well . Perhaps , I should just call myself “engineer“.

I was in a great self-involved until my civil engineer friends start mocking me with a photo.

Venturing into Data Science

After seven years of academy life at UTAR, I decided to move on to the data science industry to explore the opportunity in big data transformation.

It is a hard but necessary move to me. I will leave the full story to offline face to face discussion if our frequency and space-time are right.

Here are my observation after six months working in data science industry. Majority of Malaysia industries are business-driven entities — business comes first and research be the second (or last). Usually, R&D or r&D departments are hardly survive in the evolution (a.k.a company restructure / reorganization) considering the output are always less convincing in the board meeting. One of the common practice is to embedded the R element as part of the product development such that some tangible output are there.

Another interesting thing is, the term research varies a lot in industry. It can refer to operational research, product research, applied research, etc. Definitely it is not the research that allows you to sit down to for the whole month just to derive an elegant but less useful equation to them.

After all, I am the  latter type of person. I guess it gonna takes another few months before my boss realizes that I am working on a niche research topic instead of building the requested machine learning model.

 

Hire Research Assistant

We are looking for ONE candidate that is

  • Good in programming, mathematics, microcontroller system, and principle of network communication.
  • Good command of English
  • Discipline and independent

to work with us to improves the performance of Internet-of-Things (IoT) with locally decodable code. The successful candidate will be paid with RM 2,500 for 12 months and renewable to another year (1+1 policy). He/She is expected to register for Master of Engineering Science in Lee Kong Chian Faculty of Engineering Science (LKC FES) and complete the study in 24 months.

Knowledge in network communication and coding theory are preferable, but not a must. The successful candidate must register for Master of Engineering Science in Lee Kong Chian Faculty of Engineering Science (LKC FES).

BRIEF DESCRIPTION OF THE PROJECT

We consider a mobile wireless sensor network (MWSN) that consists of thousands of static sensor nodes with one or multiple mobile sinks (mobile base stations). Such dynamic network is commonly found in the IoT applications such as users with wireless wearable devices walking on streets or shopping at outlets – the wearable devices acting as the mobile sinks that continuously fetching the environment sensory data in order to provide ubiquitous services to users.

The candidate will work together with the team to design the communication protocol, implement the testbed on Raspberry Pi, etc. Minimum logistic work may be required.

The team members are Dr. Chong Zan Kai, Prof. Ir. Dr. Goi Bok Min, Prof. Ir. Dr. Ewe Hong Tat, Dr. Lai An Chow and Dr. Goh Hock Guan and Ms. Tan Lyk Yin. This is also a collaboration project with researchers from Kwansei Gakuin University, Japan and Victoria University of Wellington, New Zealand.

The interested candidates should send their resumes to Dr. Chong Zan Kai chongzk@utar.edu.my.

Note: The calling is closed. Thank you.

Calling for Research Assistant at UTAR

We are looking for ONE candidate that is

  • Good in programming and mathematics
  • Willing to learn
  • Good English
  • Discipline and independent

to work on a research project that improves the future network throughput with computer accelerator. The successful candidate will be paid with RM 2,500 for 12 months (renewable to another year). Knowledge in parallel processing and coding theory are preferable, but not a must. The successful candidate is expected to register for Master of Engineering Science in Lee Kong Chian Faculty of Engineering Science (LKCFES). LKCFES FYP-2 students are encouraged to apply.

Description of the Project Rateless erasure code is a kind of error-correction code, where the original message can be reconstructed from the fractional encoded message. The emergence of rateless erasure code promises a better network throughput, but constrained by the bottleneck in the corresponding encoding and decoding speed.

The candidate needs to improve the encoding and decoding speed of the rateless erasure code with graphical processing unit (GPU) and to apply it in network communication. Some logistic work may be required.

The team members includes Chong Zan Kai, Prof. Goi Bok Min, Prof. Ewe Hong Tat, Dr. Lai An Chow and Yap Wun She.

The interested candidates should send their resumes to Chong Zan Kai chongzk+ra@utar.edu.my.

Download the PDF here Call for RA in UTARRF (2015).

Note: The calling is closed. Thank you.