How machine learning is stamping out plagiarism

How machine learning is stamping out plagiarism

In 2018, distinguished American economist Paul Krugman once said, “everything that can be digitised, will be digitised, making intellectual property ever easier to copy”.

Today, Krugman’s words resonate in university and school staff rooms more than ever as students leverage the bottomless information treasure chest that is the internet to find, and plagiarise, content – often when cramming for last-minute exams.

For educators, combating this practice can be a time-consuming exercise, and as recent reports have shown, time is not something teachers have a lot of, as the vast majority are pushed to the brink with seemingly endless administrative duties.

One organisation has found a way for educators to let technology do the heavy lifting when it comes to the tedious but crucial task of detecting, and preventing, plagiarism.

Swedish company Urkund describes itself as “a fully-automatic machine learning text-recognition system made for detecting, preventing and handling plagiarism” in all languages.

Below, The Educator speaks to John Tsihlis, Chief Operating Officer at Urkund, about the software’s effectiveness, the impact of remote learning on maintaining academic integrity and how machine learning is being used to take anti-plagiarism technology to the next level.

TE: What impact has social distancing and remote learning had on organisational and technical support systems for academic integrity?

JT: When talking about academic integrity, research points to how important organisational context is for student honesty. This often comes down to the learning environment, something that is well-considered and planned for in physical and online classes, but in emergency remote teaching is less so. It would be wrong to use the teaching-from-home situation as a litmus test for any kind of innovative learning method like blended or distance learning. In these, the increase in social distance is compensated in other ways with tools supporting student engagement and motivation. Trust-building, timeliness in feedback, use of multi-media and of course structure is something many have scrambled together just to be able to keep teaching. Having the lion’s share of the students' output become digital makes it more susceptible to copy-pasting or even ghost-writing. That is why having a pedagogic approach to supporting academic integrity matters. Even working with honour codes, something shown to reduce cheating, will have less of an effect as social distance increases and peers feel more detached from the group. To make up for this, technical solutions like support systems to help you identify cases of plagiarism will elevate the awareness of academic honesty among both students and instructors. We do believe that as many now learn the hard way what needs to be in place for remote teaching to be successful, it will generate important lessons for how to use online or distance learning in the future.

TE: You say the sudden need for remote teaching highlights some key success factors for digital/online/blended/distance learning. What are some of these factors and how can educators leverage them for impact?

JT: The current situation is really taking teaching remotely to the extreme. As social distancing recommendations are easing, we’ll be able to take learnings from this and implement in our usual learning environments. At Urkund we see exciting possibilities for more personalised learning experiences. With the right digital tools to support instructors to better connect with their students, regardless of the class consisting of 15 or 50 students, will be a force for good in term of student engagement and motivation. One way we are approaching it is through our research into stylometry and author support tools to weed out ghost-writing, undermining the creation of knowledge, and instead guide students to become original thinkers. We’re also teaming up with cutting-edge researchers within cognitive science to further enhance the support of originality for each academic writer, no matter the level.

TE: Urkund has clearly been doing some very important work for schools and universities when it comes to plagiarism protection. What do you see as Urkund’s ‘stand out’ value for school and university leaders who have access to a very broad market?

JT: What we hear time and time again is how our customer-centric approach delivers on some very key areas that institutions value in originality checkers. There’s the obvious one around accuracy and here we stress the fact that only focusing on higher match percentages is usually a race to the bottom in terms of actual usability for instructors. They need a system that highlights what they need to discuss with their students, without sifting through tonnes of false positives. From an operational point of view, reliability and interoperability are crucial. Even in the high growth period we’re currently living through, we never short-change uptime or system stability, and we are strong proponents of LTI and making our originality reports sit smoothly within whatever workflow exists at our customers.

TE: I understand that Urkund is working on using machine learning combined with cognitive science for author recognition, to both help weed out ghost-writing and being a part of the shift to a more personalised education. Can you tell us more about this?

JT: We have used machine learning longer than it has been a buzzword (checking Google Trends and yes it's true). The test-winning accuracy of our algorithms is built in part on machine learning. Recently our research has focused on how we can help identify cases of ghostwriting. We have been running tests internally around author recognition and so-called stylometry and with an ever-increasing amount of academic output becoming digital, the prospect of weeding out ghostwriting is soon a possibility. Just as we have seen with the standard originality checks, introducing author-metrics as part of the feedback will bring awareness to the practice and proactively work to prevent it. The word 'feedback' here is crucial as we have always promoted a pedagogic approach to fighting for academic honesty. Together with partners that are leaders in big-data and cognitive science, our research aims to continue being a coach for instructors as learners are developing their academic voice.