SmartDots > AI Testing
This document describes how AI algorithms can be integrated and tested within the SmartDots platform, the ICES international tool for standardizing fish age determination. It covers two integration modes: retrospective benchmarking against finalized expert events (used as Gold Standard datasets), and proactive participation in live workshops alongside human readers. The document also details the scoring framework used to evaluate AI performance, including the penalty formula, reporting structure, and API endpoints for submitting and retrieving results.
SmartDots is the international platform designed to standardize the age determination of fish, currently it also supports exchanges, workshops and training events of maturity determination, larvae and eggs identification. But in this AI integration we will focus in the Age reading events, the age reading involves most of the times the counting the "annuli" (annual growth rings) in calcified structures, most commonly otoliths (ear stones), even do there are species where other structures are used like scales or vertebras.
Because interpreting these rings is subjective, Exchanges and Workshops are essential for quality control.
An Exchange is a remote, large-scale event where multiple researchers from different countries read the same set of fish images independently.
A Workshop is a more intensive, often face-to-face (or synchronized virtual) meeting that usually follows an exchange.
SmartDots exchanges and workshops provide a rigorous, standardized framework ideal for testing AI performance, because they offer a "Gold Standard" dataset. The biggest hurdle in AI development is often the lack of high-quality, labeled data. Because workshops are designed specifically to reach consensus on the most difficult samples, the resulting Modal Age serves as a peer-reviewed benchmark. An AI can be inserted into an exchange as if it were a human participant, allowing coordinators to directly compare the algorithm's accuracy, precision, and bias against a panel of international experts under identical conditions.
Furthermore, since exchanges involve samples from different institutes, regions, and fish stocks, they test the AI's robustness and generalization — crucial to ensuring an algorithm doesn't merely "memorize" one type of otolith but actually learns the biological features of growth rings. The statistical metrics already built into SmartDots — such as Coefficient of Variation (CV) and Percentage Agreement (PA) — provide an immediate, standard scorecard for the AI's reliability.
Using these events to test AI creates a clear pathway for operational integration. AI developers can identify exactly where an algorithm fails (e.g., misidentifying "false rings" or "edge growth"), and expert discussions become the specific training data used to improve the next model version — eventually leading to a fully calibrated and trusted automated system.
SmartDots utilizes a modular architecture where the software and database communicate via an API. This allows AI algorithms to interact with the system exactly like human users. To maintain data integrity, AI agents are assigned unique identifiers. While human readers are anonymized by country (e.g., DK01, DK02), algorithms are always identified as AI01, AI02, etc., ensuring humans and AI agents are never confused in datasets.
There are two primary ways for an AI algorithm to operate within SmartDots: with past events (retrospective) or with ongoing events (proactive).
To facilitate benchmarking AI algorithms against past events, the following has been prepared:
For accessing past events, no registration is required — the AI algorithm can only access public events.
Data Licensing and Access: No formal registration is required. Since 2026, all closed workshops and exchanges fall under the CC BY 4.0 license, meaning any AI can simply access these public data via the SmartDots API.
Workflow: The AI performs a "blind" upload of its predictions. SmartDots then automatically calculates a score based on the historical consensus. Since these events are closed, the Modal Age and Percent Agreement are locked, providing a "Gold Standard" for validation.
Once a SmartDots workshop or exchange is concluded and results are made public, the event serves as a Gold Standard dataset. The fixed consensus reached by human experts provides the perfect environment for testing and ranking AI algorithms.
This framework evaluates an AI algorithm's ability to predict fish age by comparing its outputs against the Modal Age (the consensus reached by expert human readers). The system uses a weighted penalty logic that scales based on the number of samples in the event and the confidence level of the human experts.
This ranking system is unique because it penalizes the AI based on human certainty: if the experts all agree (high agreement) and the AI is wrong, the penalty is higher. If the experts are unsure (low agreement), the penalty is lower. Scores are relative to dataset size, ensuring a small event and a large event are both graded on a 0–100 scale.
Performance is measured on a scale from 0 to 100, where 100 represents a perfect match with the human expert consensus.
For every sample where the AI's predicted age does not match the Modal Age, a penalty is deducted from the total score.
Variables:
Each sample is worth 10 points (100 / 10 = 10). Modal Age for the fish sample is 3.
| AI Guess | Distance to Modal Age | Percentage of Agreement | Points Deducted |
|---|---|---|---|
| 3 | 0 | 100% | 0.25 × 10 × 1 × 0 = 0 |
| 4 | 1 | 50% | 0.25 × 1 × 0.5 × 10 = 2.5 |
| 5 | 2 | 70% | 0.25 × 2 × 0.7 × 10 = 3.5 |
| 7 | 4 (Cap) | 100% | 0.25 × 4 × 1 × 10 = 10 |
| 8 | 4 (Cap) | 35% | 0.25 × 4 × 0.3 × 10 = 3.5 |
In addition to the final AI score, the report includes the Percentage of Agreement (PA), calculated based on the consensus among human readers for a specific age.
Example Scenario (ten human readers):
Why PA Matters for Benchmarking:
To see an example of the benchmarking output, see Annex 1. An API call also exists for demonstration: it accepts an event name and reader name, then submits that reader's readings as if they were an AI algorithm:
https://smartdots.ices.dk/API/getScoringReaderInEvent?tblEventID=1842&reader=balliu
The output includes a graph showing the performance of the "AI algorithm" plotted alongside actual readers of the event:
https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=30
Registered institutes can deploy algorithms to participate in live workshops alongside human experts.
Institutes can formally register their AI algorithms within SmartDots. Registration can only be completed by a country coordinator.
When a new exchange or workshop is created, the Event Coordinator has a checkbox to allow AI algorithms into the session. If an algorithm is registered for that species and area, it will be permitted to participate.
When ages for an event are uploaded, the SmartDots platform compares them against the annotations made during the event, generating an AI Score and an Average Percentage of Agreement.
| Field | Description |
|---|---|
| Success | Indicates the event was successfully located and the report generated. |
| Number_samples_in_event | Total number of distinct fish present in the event. |
| each_sample_value | The weighted value assigned to each individual sample: 100 / Number_samples_in_event |
| AI_Score | The calculated performance score for the AI. |
| Readers_Average_Score | The average calculated performance of all readers. |
| AI_Average_Percentage_Agreement | The overall average agreement across all samples. |
| all_Readers_average_percentage_agreement_for_modal_age | The average percentage of agreement between readers for all FishIDs. |
| number_of_samples_reported_by_AI | The count of samples where the AI successfully provided an age. |
| Invalid_Fish_Sent_By_AI | A list of any fish IDs submitted with errors or invalid data. |
| message | Status confirmation (e.g., "Report processed and scored successfully"). |
| see_performance_... | URL link to visual performance graphs comparing AI against human readers. |
{
"success": true,
"number_samples_in_event": 162,
"each_sample_value": 0.617283950617284,
"aI_finalScore": 97.479383,
"aI_AveragePercentageAgreement": 82.7159259259259,
"all_fishID_average_percentage_agreement_for_modal_age": 85.0822222222222,
"number_of_samples_reported_by_AI": 160,
"invalid_Fish_Sent_By_Ai": [],
"message": "Report processed and scored successfully.",
"see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34"
}
This is an example graph for event 1842, where human readers are the blue bars and the AI algorithm is the green one:
| Field | Description |
|---|---|
| tblEventID | The event identifier. |
| modalAge | The modal age of this record. |
| average_Percentage_agreement_between_readers | The average percentage of agreement between readers for that modal age. |
| number_samples_with_that_modal_age | Number of samples with that modal age. |
| number_samples_read_AI | Number of samples that the AI read in that modal age group. |
| average_percentage_agreement_for_AI_for_modal_age | The average percentage of agreement for the AI for that modal age. |
A detailed record is generated for every fish in the event.
| Field | Description |
|---|---|
| FishID | The unique identifier for the fish sample. |
| Status | Reported, Missing, or Not Found. |
| AI_reported_Age | The age reported by the AI for this fish. |
| Readers_Modal_Age | The modal age according to the readers of the event. |
| Percentage_of_agreement_modalAge | The percentage of agreement for the modal age. |
| Percentage_of_agreement_reported_age | The percentage of agreement for the AI-reported age. |
| Score | The score for that individual fish. |
The integration of AI into SmartDots represents a significant evolution in biological data processing within the ICES community. This roadmap achieves three critical goals:
As this roadmap progresses over the next two years, the continuous feedback loop between institutional AI models and SmartDots' reporting tools will pave the way for more efficient, consistent, and scalable biological readings across all member institutes.
{
"success": true,
"number_samples_in_event": 162,
"each_sample_value": 0.617283950617284,
"aI_Score": 97.2,
"readers_Average_Score": 97.3,
"aI_Average_Percentage_Agreement": 82.7159259259259,
"all_Readers_average_percentage_agreement_for_modal_age": 85.0822222222222,
"number_of_samples_reported_by_AI": 160,
"invalid_Fish_Sent_By_Ai": [],
"message": "Report processed and scored successfully.",
"see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34",
"performance_AI_Per_Modal_Age": [
{ "tblEventID": 1842, "modalAge": 0, "average_Percentage_agreement_between_readers": 55.56, "number_samples_with_that_modal_age": 3, "number_samples_read_AI": 1, "average_pergentage_agreement_for_AI_for_modal_age": 16.67 },
{ "tblEventID": 1842, "modalAge": 1, "average_Percentage_agreement_between_readers": 92.86, "number_samples_with_that_modal_age": 7, "number_samples_read_AI": 8, "average_pergentage_agreement_for_AI_for_modal_age": 87.5 },
{ "tblEventID": 1842, "modalAge": 2, "average_Percentage_agreement_between_readers": 97.92, "number_samples_with_that_modal_age": 8, "number_samples_read_AI": 8, "average_pergentage_agreement_for_AI_for_modal_age": 97.92 },
{ "tblEventID": 1842, "modalAge": 3, "average_Percentage_agreement_between_readers": 91.67, "number_samples_with_that_modal_age": 16, "number_samples_read_AI": 17, "average_pergentage_agreement_for_AI_for_modal_age": 89.22 },
{ "tblEventID": 1842, "modalAge": 4, "average_Percentage_agreement_between_readers": 83.93, "number_samples_with_that_modal_age": 28, "number_samples_read_AI": 25, "average_pergentage_agreement_for_AI_for_modal_age": 88.0 },
{ "tblEventID": 1842, "modalAge": 5, "average_Percentage_agreement_between_readers": 88.51, "number_samples_with_that_modal_age": 29, "number_samples_read_AI": 31, "average_pergentage_agreement_for_AI_for_modal_age": 86.02 },
{ "tblEventID": 1842, "modalAge": 6, "average_Percentage_agreement_between_readers": 87.78, "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 15, "average_pergentage_agreement_for_AI_for_modal_age": 87.78 },
{ "tblEventID": 1842, "modalAge": 7, "average_Percentage_agreement_between_readers": 86.36, "number_samples_with_that_modal_age": 11, "number_samples_read_AI": 11, "average_pergentage_agreement_for_AI_for_modal_age": 86.36 },
{ "tblEventID": 1842, "modalAge": 8, "average_Percentage_agreement_between_readers": 77.78, "number_samples_with_that_modal_age": 12, "number_samples_read_AI": 12, "average_pergentage_agreement_for_AI_for_modal_age": 77.78 },
{ "tblEventID": 1842, "modalAge": 9, "average_Percentage_agreement_between_readers": 88.10, "number_samples_with_that_modal_age": 7, "number_samples_read_AI": 7, "average_pergentage_agreement_for_AI_for_modal_age": 88.10 },
{ "tblEventID": 1842, "modalAge": 10, "average_Percentage_agreement_between_readers": 83.33, "number_samples_with_that_modal_age": 6, "number_samples_read_AI": 7, "average_pergentage_agreement_for_AI_for_modal_age": 78.57 },
{ "tblEventID": 1842, "modalAge": 11, "average_Percentage_agreement_between_readers": 71.11, "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 14, "average_pergentage_agreement_for_AI_for_modal_age": 72.62 },
{ "tblEventID": 1842, "modalAge": 12, "average_Percentage_agreement_between_readers": 83.33, "number_samples_with_that_modal_age": 4, "number_samples_read_AI": 4, "average_pergentage_agreement_for_AI_for_modal_age": 83.33 },
{ "tblEventID": 1842, "modalAge": 14, "average_Percentage_agreement_between_readers": 83.33, "number_samples_with_that_modal_age": 1, "number_samples_read_AI": 0, "average_pergentage_agreement_for_AI_for_modal_age": 0 }
],
"detailed_scores_per_fishID": [ "... (162 records) ..." ]
}
This is an example of the JSON body for the POST request to http://smartdots.ices.dk/API/submitFishAges:
{
"email": "researcher@example.com",
"tblEventID": 1842,
"institute": "Test",
"comments": "this is a test",
"nameAIAlgorithm": "TestAI",
"version": "0.02",
"reports": [
{ "fishID": "FISH-99A", "age": 4 },
{ "fishID": "FISH-102B", "age": 7 },
{ "fishID": "FISH-205C", "age": 2 }
]
}