Research is exciting. Research in Natural Language Processing (NLP) is very exciting, which explains why it gets so much publicity. Despite the enormous advances made by the academic world, specifically in the field of NLP, it is rare that they directly solve real-world problems. The academic world provides the necessary tools, which appear magical to those who first discover them; they are, however, not self-sufficient. Practical implementation of these tools is at least as important as the conceptual idea, if not more.
Just over a year ago, I joined Della as a newly graduated Data Scientist and I was eager to put into practice all I had learned at university. I studied Mathematics, Physics and more recently Artificial Intelligence and Machine Learning. Although it may seem obvious to experienced engineers, the importance of practical implementation of NLP techniques is the main takeaway of my professional experience so far. It continues to amaze a young mind like mine, which is why I deem it worth sharing, especially for students reaching the end of their studies and pondering the question ‘what shall I do next?’ This was the very question I faced a year ago: should I join an industry or stay in the academic world? I hope my story can help those in a similar quandary.
Concepts vs practicalities
As a student, I had often pondered the benefit of studying concepts and theories without seeing these concepts come to life through real practical examples. Since academic education is predominantly the teaching of ideas, I knew that, when my time came to make the choice, I needed to opt for a career involving some technical hands-on work. So I decided to find a job as a Data Scientist rather than applying for a PhD.
I had a strong suspicion that spending time on a product, on real, concrete applications was probably going to offer a whole new perspective on Machine Learning that I had yet to experience. I knew that academic projects and studies could be rewarding but my issue with it was that they rarely involved working as a team and they were never about meeting someone’s actual needs. It was therefore very appealing to find out what client feedback was going to entail. My experience thus far at Della has proven that I was right to imagine that it would often imply finding the simplest solutions to precise issues.
My main concern about moving into industry was that I would discover that all those years studying complex concepts would have no impact on my day-to-day role, however, I can report this is not true. Far from having forgotten all of it, my job requires me to make the most of my academically-acquired knowledge and to keep up to date with all the latest developments in NLP. Unlike the role of an NLP researcher in academia, which I assume would involve breaking and rebuilding concepts in an attempt to beat the current performance by state of the art techniques, my role is very different.
NLP put into practice
Let me illustrate this with my own personal experience. At Della, our aim is to assist contract review and currently, we do this through Question Answering (QA) systems. Put simply, you can ask Della a question and our AI is trained to find the answer for you. We also build additional features to provide clients with a solution that can help them extract the information they need from their contracts as efficiently as possible.
One such feature requested by one of our clients is party detection. A contract is always made between different parties and the rather simple task (for a human) of detecting who these parties are along with their meta information (such as address, country of incorporation and company registration numbers) appeared, at first, to be a simple Named Entity Recognition (NER) problem. Quite naturally therefore, an NER model was implemented and added to our platform when this need was first mentioned (which, for context, was before I joined Della). Surprisingly, the efficacy of the NER model was not what we hoped for and we did not understand why. We had plenty of training data, we had a model praised by the research world and we had a process that was working well for our QA systems. We investigated the party detection problem multiple times, but we could not find out what was going wrong. The frustration was all the more bitter as NER is now considered a simple task by the academic world.
As is often the way with these things, the solution came about while we were addressing other issues and fixes. I was assigned to work on another project: it was very important for one of our clients to delete certain data points, for example contracts which contain sensitive information.
What if we could copy data with faked entities whilst preserving their legal characteristics? Clients could verify that no sensitive data remains and leave us with some fuel to keep Della’s brain ticking. Seduced by this idea, we put our energy into thinking of ways to achieve it. Again, a simple NER task seemed the most promising solution. Surprise! It worked very well. We are confident in the capacity of our NER model to highlight names, organisations, dates, identification numbers, amounts of money, etc. Yet we had used very little data. This was when the flaw of our party detection model struck us: we did not lack data, but we suffered from class imbalance. While contracts are typically lengthy documents the parties are found only a few times in the text. Hence, there is a strong imbalance between the number of words labeled as parties and the rest of the words.
Our anonymiser on the other hand had more varied data to work with and was not limited to words classified only as parties. By spending some time filtering the data, weighing it, undersampling some of it, etc., we managed to significantly improve the performance. Additionally, we learned along the way to never underestimate the importance of the quality of the training data. By making the most of another success, we were able to enhance our client’s experience. And yet it did not involve anything particularly groundbreaking in the academic sense.
Of course, this project, just like any other, still deserves more attention and it will surely be improved in other ways. The valuable lesson I have learned, just as philosopher Occam explains in his razor example before me, is that it is often the most simple option that works best.
I am fascinated by research in NLP and Machine Learning in general. I read voraciously, my team and I share articles together on a regular basis and I study the latest advancements in that field with vigour. But I find it just as exciting to find simple solutions that have a huge impact and with regard to this satisfaction, for me, working as an engineer at Della, solving real world challenges has been an incredibly rewarding career path for me.