Skip to content
Go back

Using JPlag for Automated Plagiarism Detection in Programming Assignments

6 min read

Background: Teaching Programming at University Level

FIT9131 (Introduction to Programming) was a challenging unit taught at the Masters level for the Master of Information Technology course at Monash University. As a teaching team member until 2019, I witnessed firsthand the struggles students faced with this demanding course. This was done for multiple semesters over several years.

The unit’s reputation for difficulty was so well-known that external services began targeting students. Here’s an advertisement I once found in a university bathroom:

External tutoring advertisement in university bathroom

External tutoring services targeting struggling students

Why JPlag for Code Similarity Detection?

The teaching team chose JPlag as our primary tool for several compelling reasons:

The Challenge: Processing Student Submissions

JPlag accepts a directory containing student assignments and produces HTML reports. However, the main challenge lies in preparing clean, standardized data for accurate analysis.

Common Submission Issues

Students often submit files with various problems:

Solution: Automated File Processing Pipeline

Step 1: Dependency Verification

First, we ensure all required tools are installed:

#!/bin/bash
# Check for dependencies
if [ $# -ne 0 ]; then
echo "Error: No command line arguments needed"
exit 1
fi
command -v detox
exit_status=$?
if [ $exit_status -eq 1 ]; then
echo "Error: Detox does not exist. Please do sudo apt install detox"
exit 1
fi
command -v 7zip
exit_status=$?
if [ $exit_status -eq 1 ]; then
echo "Error: 7zip does not exist. Please install 7zip"
exit 1
fi
command -v unrar
exit_status=$?
if [ $exit_status -eq 1 ]; then
echo "Error: unrar does not exist. Please install unrar"
exit 1
fi

Step 2: Intelligent File Extraction

Instead of trusting file extensions, we use the file command to determine the actual MIME type:

Terminal window
# Program to unzip all files in a directory with the correct program
for file in ./*; do
if file --mime-type "$file" | grep -q zip$; then
echo "Unzip $file"
unzip -d "${file%*.zip}" "$file"
# Check exit status if it is 0, then it is ok
if [ $? -eq 0 ]; then
rm -rf "$file"
else
mv "$file" ../
echo "$file" >>../log.txt
fi
fi
if file --mime-type "$file" | grep -q rar$; then
echo "Unrar $file"
unrar x -ad "$file"
if [ $? -eq 0 ]; then
rm -rf "$file"
else
mv "$file" ../
echo "$file" >>../log.txt
fi
fi
if file --mime-type "$file" | grep -q 7z-compressed$; then
echo "7z $file"
7z x "$file" -o"${file%*.7z}"
if [ $? -eq 0 ]; then
rm -rf "$file"
else
mv "$file" ../
echo "$file" >>../log.txt
fi
fi
done

Step 3: File Cleaning and Standardization

After extraction, we clean the files for JPlag compatibility:

Terminal window
# Remove unneeded files
rm -rf ./*/__MACOSX
# Remove unneeded class files
find . -type f -name '*.class' -delete
find . -type f -name '*.ctxt' -delete
# Delete hidden files
find . -name ._\* -print0 | xargs -0 rm -f
# Detox the file to prevent bad naming convention from students.
detox -r ./*
# Remove all unicode.
find . -type f -iname '*.java' -print | while read f; do
echo "Removing unicode from $f"
LANG=C sed -i 's/[\d128-\d255]//g' "$f"
done

Step 4: JPlag Analysis

Finally, we run JPlag with optimized settings:

Terminal window
java -jar ../jplag.jar -l java17 -vl -r results -s -m 50 zipped

Sample Results

Here’s what the JPlag output looks like:

JPlag similarity detection results

JPlag generates comprehensive HTML reports showing similarity percentages and matching code sections

Interpreting Results

Reading JPlag results is straightforward, but it’s crucial to remember that similarity scores are guidelines, not definitive proof. Students with high similarity scores should be interviewed to determine if the similarity indicates:

Key Technical Insights

1. MIME Type Detection

Using file --mime-type instead of file extensions prevents processing errors from mislabeled files.

2. Error Handling

The script logs failed extractions, allowing for manual review of problematic submissions.

3. Unicode Removal

JPlag can struggle with Unicode characters, so we strip them using sed with the LANG=C locale.

4. File Cleanup

Removing generated files (.class, .ctxt) and system files (__MACOSX) ensures JPlag focuses on source code.

Educational Philosophy

External Tutoring Services

My stance on students using external tutoring services is neutral. These services can act as personal tutors, which can be beneficial if:

The Real Issue: Student-Teacher Communication

What concerns me more is when students feel the teaching team is unapproachable. We should be their primary resource for help and guidance. If students are turning to external services, it might indicate a need to improve our accessibility and support systems.

Best Practices for Implementation

1. Documentation

2. Student Communication

3. Continuous Improvement

Conclusion

JPlag is a powerful tool that significantly improves the efficiency of code similarity detection in educational settings. However, it should be used as part of a comprehensive approach that includes:

The goal isn’t just to catch plagiarism—it’s to create an environment where students learn effectively and feel supported in their educational journey.


JPlag serves as a valuable assistant to the teaching team, helping us focus our attention where it’s most needed while maintaining academic integrity.


Share this post on:

Previous Post
Sentiment Analysis using VADER in JavaScript
Next Post
TSP Algorithm: Solving the Traveling Salesman Problem with Genetic Algorithms