Artificial intelligence is dynamically reshaping the Transport-Shipping-Logistics (TSL) industry by improving data analysis, automating processes, and supporting decision-making. As part of an experiment conducted by RST Software in collaboration with the Poznań School of Logistics, our AI-powered tool scored 84.3% on the professional freight forwarder exam, surpassing the required passing threshold of 75%.
This achievement demonstrates that AI can support freight forwarders in their daily tasks by automating repetitive duties, analyzing documents, and optimizing transport routes. In this article, we present the technical details of the experiment and the challenges we faced.
Challenges of the Experiment
The national freight forwarder exam consists of two parts:
- Written section – 60 multiple-choice questions (A-D).
- Practical section – requiring problem-solving and a written response.
Due to the limitations of AI models in evaluating the quality of open-ended answers, our experiment focused exclusively on the written section. Our goal was to determine how accurately AI could answer the exam’s test questions.
Preparing the Test Questions
To conduct the test, we had to extract the exam questions into a textual format. This process involved parsing PDF files containing exam questions from 2016 to 2024. Several challenges arose:
- Questions containing images – 10-15% of the exam questions included graphics, which were removed during text conversion. AI models were unable to answer these questions.
![](https://cdn.prod.website-files.com/625350933660db97afa01913/67ab883d0a676fd9eca8c54f_AD_4nXc2Sh0hvV39gtFw_PNzMOSIxR5p97BOVzjtjlZYaxksUxH3icZc0S3bn0gnWSPQMamWI0SkJkNfgrpsdOaAFHTaCWNI0ujEubqVUTpqXIa2Eshrvg6wF203DtMxypUtmhs9zRDa.png)
- Issues with reading files – some questions were not extracted correctly, but this was considered negligible at the testing stage.
- Tables in questions – the parsing process successfully extracted data from tables, preserving their content in the question set.
![](https://cdn.prod.website-files.com/625350933660db97afa01913/67ab883cd4082924528a1716_AD_4nXfzjSvwiG4kXzLmyxysahT5aGwr7jogWe7S4Uszq2MwZLdUBafbXUJFntic12KmZ-Hxdjni-44tZuql8DO-iKkdPQF2FK-qUJTtqt28Ky41YfqJ2-2Ud1OXdx-CZteC28-Up2s.png)
AI Model thought structure
To enhance the AI’s effectiveness in solving the exam, we imposed a structured methodology for analysis:
- Extracting key information from the question (key_insights)
- Considering all available answer choices (reasoning)
- Critically evaluating each option (alternative_considerations)
- Assessing confidence in the answer choice (confidence)
- Final answer selection (selected_answer)
Altering this sequence resulted in poorer performance for each model, making it crucial to maintain a structured decision-making approach.
![](https://cdn.prod.website-files.com/625350933660db97afa01913/67ab883c550a4ce0913f1112_AD_4nXfAYZrpberhGrNMy_MKZrPgCoGK0rVqZtAo_GAsAuDzsa-HshfhjgmoFaLQAEXdjCjsnS1da1PrxHBerdlLSEgtu2SEZif4MYKjKiLyhqmgENU9vEthz-jFhsY11hVvOc_TBPeF.png)
Additionally, AI models were not always able to determine a single correct answer. If confidence was below 1, they often selected the wrong option.
![](https://cdn.prod.website-files.com/625350933660db97afa01913/67ab883ce1e2f79996bd49ff_AD_4nXf1iPRLHYR1aP75f35piC1ttq9Cry5SQKCIEA4UWvBeo2DW6QGc8c0GEOWx-MKKQ-dYOeXntKeB-bIqH-DvoJADgQ3H9W0jZVZgmginOieEOoytYhUegNIJIbAys_FGVOiPMzTM.png)
Another challenge was enforcing AI to provide answers solely in the form of a single selected letter (A, B, C, or D) without additional text. This required precise programming of the response generation process.
Test results
During the experiment, we tested over 15 different LLM models, processing more than 20 million tokens in total. The system was tested on 1,550 exam questions from 2016 to 2024.
Results achieved:
- Average score: 84.3% (passing the exam required 75%).
- Best result: 89.4% – 2019 exam.
Results by LLM model:
- Claude 3.5 achieved the highest score.
- GPT-4.0, Grok 2, and Gemini delivered very similar results.
- Llama 3 performed the worst – the oldest model in the set.
- Chinese DeepSeek achieved very high scores at the lowest operational cost, making it an interesting alternative for further research.
![](https://cdn.prod.website-files.com/625350933660db97afa01913/67ab883c9c3b2935ecdfe662_AD_4nXdQgOJxymcZPQsgMqOLCqeZPU7_GEOYZhxwnnJSnQ7HdSMe_kgSDHWu2w9Xjif5nUvnHf51dtqwzgN3DYrV0Kc7Jp0_WH7r2p4TQMXRfHpPXgHrzU2KV8zJCty6hf9C9IVk5Uc.png)
Conclusions and future directions
The experiment demonstrated that artificial intelligence can effectively analyze exam tests, showing high accuracy in identifying correct answers. This opens doors for further applications of AI technology in logistics automation, route planning, resource management, and cost optimization in transportation.
Our research proves that AI will not replace freight forwarders (at least not yet) since it struggles in areas requiring negotiation, relationship-building with partners, or interpreting emotional context. However, those freight forwarders who can efficiently integrate AI into their workflow will gain a significant advantage.
At RST Software, we not only explore AI’s potential but also implement real solutions for transport companies. If you want to learn more about AI’s capabilities in logistics, contact us – together, we will find a solution tailored to your needs.