Creating AI-powered technologies to create a whole new world of convenience – Samsung Global Newsroom
Following Episode 1
In this relay series, Samsung Newsroom features technical experts from Samsung R&D centers around the world to learn more about their work and how it directly improves the lives of consumers.
The second expert in the series is Lukasz Slabinski, head of the artificial intelligence team at Samsung R&D Institute Poland (SRPOL). Slabinski joined SRPOL in 2013 as a senior engineer, and after 8 years of dedicated work, now leads the SRPOL AI team. Read on to learn more about the exciting innovation in which Slabinski and his team are involved at SRPOL.
Q: Designing solutions for speech recognition is known to be very complex. When working on language-related technologies, what challenges have you encountered and how have you overcome them?
In my opinion, the technologies related to language are much more complex than any other. Humanity communicates in nearly 7,000 constantly evolving languages, subdivided into endless accents and dialects. In addition, human language is much less objective than, for example, an image, which can be described in mathematical formulas. People encode their thoughts as a set of sounds or characters in a message, which must then be decoded and interpreted by others. Because each phase of this process is personal, creative, and non-deterministic, human language-based communication is very complex and ambiguous. So, on the one hand, we can appreciate beautiful poetry and funny jokes, and on the other hand, sometimes suffer from misunderstandings.
People in R&D who work on natural language processing (NLP) often reach their own limits, which are inherently human. Even we find it difficult to communicate clearly with our co-workers or our family at home. So how, for example, can an engineer who speaks 2 languages design and code a machine translation system for 40 different languages? We solve this paradox using machine learning technologies.
During the process known as ‘training’, we automatically extract general models based on examples from our data sets and store them as a model. To build a machine translation system, we train a neural network to map a sentence in different languages based on millions of examples, all carefully collected and cleaned up beforehand. It sounds easy, but here we are dealing with 3 fundamental challenges.
The first challenge is the design of an appropriate machine learning model architecture capable of memorizing and generalizing enough language models for given problems such as machine translation, sentiment analysis, text synthesis and others. .
The second challenge is preparing a sufficient amount of training data, as machine learning systems can only recognize and remember the patterns presented in the training data set.
The final challenge is the deployment of an already trained machine learning model on a dedicated cloud platform or on the device.
We meet these challenges by harnessing the vast expertise of our engineers, sophisticated approaches to data collection, and endlessly experimenting with cutting-edge machine learning architectures.
Q: Can you briefly introduce your AI team, Samsung R&D Institute Poland (SRPOL) and the type of work that takes place there?
SRPOL is one of the largest international software R&D centers in Poland. It is located in two cities: Warsaw, the capital of Poland and Krakow, which is a major technological hub in its region. We work closely with local start-ups, universities and research institutes.
The mission of the SRPOL AI team is to create AI-powered features, tools and services that can make and enrich human life. We mainly focus on the areas of NLP and Audio Intelligence, but we also have expertise in many different specialties including recommendation systems, indoor positioning, visual analysis and augmented reality.
Q: As the Polish Institute’s AI Team Leader since 2018, you have overseen a myriad of projects with and without a focus on NLP. What are you working on with your team now?
When it comes to the field of NLP, we continue our journey that began over 10 years ago with the development of systems such as machine translation, dialogue systems, including question answering and analysis. of text. We work on both scalable and powerful cloud-based services, as well as fast and offline on-device apps.
Audio intelligence is a newer area for us. We started to focus our research capabilities there about several years ago as the field grew in importance. Currently, we are working on the recognition, separation, enhancement and analysis of sound. During our work, we take into account all levels of audio processing, from understanding the acoustic scene to fine-tuning the audio algorithms on board devices with very limited hardware resources, such as wireless headphones.
Q: Your technology priorities include NLP, text and data mining, audio intelligence, and more. Did your research directly affect the development of a specific Samsung product or service, and what benefit has your team’s contribution provided to users?
SRPOL has a long history of commercializing AI technologies, but we haven’t done it alone. We are proud to be part of a larger whole, in which SRPOL works closely with other Samsung R&D centers and contributes to commercialization.
For example, we have helped develop several smart text input features for Samsung mobile devices, including onscreen keyboard, hashtag function, Samsung Note title recommendation, and smart text replies on watches. smart.
We also contributed to the Galaxy Store recommendation system, which suggests the most interesting games to a user based on their preferences.
Q: As an advocate for new areas of AI such as audio intelligence, what do you think are the main trends in your industry right now? How will this technology affect people’s daily lives?
I think audio intelligence will be a game changer for all consumer electronics devices. It is extremely important to work on audio analysis, as it is the missing part of advanced truly human-centric AI-based systems.
Powerful NLP systems analyze user intent as expressed through text and speech. Computer vision algorithms are the source of almost all camera output and visual content. For most of us, it’s hard to imagine driving a car without navigation, typing a message without a spell checker, or searching for information without the internet. But, except for a few professional apps, so far we very rarely use intelligent audio technology to improve our hearing. In my opinion, that should change soon.
Suppose we have some commonly available technology that allows people to choose what and how they want to hear. For example, while having lunch with a friend in a park in a busy city center, someone might choose to hear only the sounds of nature and the person they are talking to. Or, imagine an advanced VR or AR system, recently called Metaverse, that creates an immersive 3D audio experience right in people’s heads. These two concepts alone generate hundreds of possible new use cases, but let’s dig deeper. How about hearing things that are currently inaudible to people? Now humans can only hear a narrow spectrum of different sounds. Our world is full of meaningful sounds that, for the most part, current AI technologies are not involved in. With the development of audio intelligence technologies, I think all of this is going to affect people’s lives tremendously.
Q: How have you incorporated current trends into the research you do at Samsung R&D Institute Poland?
Besides NLP and audio, we are also working to find the most efficient ways to build truly multimodal systems. We do this by researching and analyzing use cases from different angles. Such analysis is made possible by our diverse and interdisciplinary team of engineers, linguists, data scientists and more.
Q: What has been your most important achievement at SRPOL so far?
It would be our machine translation solution. Our solution has won victories in various competitions for five consecutive years: the International Workshop on Spoken Language Translation (IWSLT) from 2017 to 2020; the Machine Translation Workshop (WMT) in 2020; and the Workshop on Asian Translation (WAT) in 2021. These are among the most prestigious international competitions in our field.
Earning recognition at WAT this year has been a particularly satisfying step, as developing our solution for Asian languages was originally a difficult feat for us as Polish engineers – but this achievement has proven the true power of our technology. which goes beyond a simple demonstration.
Another achievement that I am very proud of is the speed of growth that the audio intelligence team and its technological development have achieved. In just a few years, after starting from almost zero, we were able to step onto the podium of the Detection and Classification of Acoustic Scenes and Events workshop for two consecutive years, 2019 and 2020. We have also published several scientific papers and patents in this field. I’m sure this is just the start of our prolific activities in this area.
An interview with Bin Dai, a machine learning expert at Samsung Research Institute China-Beijing can be found in the following episode.