SmartDots

SmartDots > AI Testing





About this page

This document describes how AI algorithms can be integrated and tested within the SmartDots platform, the ICES international tool for standardizing fish age determination. It covers two integration modes: retrospective benchmarking against finalized expert events (used as Gold Standard datasets), and proactive participation in live workshops alongside human readers. The document also details the scoring framework used to evaluate AI performance, including the penalty formula, reporting structure, and API endpoints for submitting and retrieving results.

⬇ Download this document (.docx)
Roadmap for AI Integration in SmartDots Platform

(Leveraging SmartDots Frameworks for the Validation of AI Ageing Algorithms)



SmartDots is the international platform designed to standardize the age determination of fish, currently it also supports exchanges, workshops and training events of maturity determination, larvae and eggs identification. But in this AI integration we will focus in the Age reading events, the age reading involves most of the times the counting the "annuli" (annual growth rings) in calcified structures, most commonly otoliths (ear stones), even do there are species where other structures are used like scales or vertebras.

Because interpreting these rings is subjective, Exchanges and Workshops are essential for quality control.

SmartDots Exchanges

An Exchange is a remote, large-scale event where multiple researchers from different countries read the same set of fish images independently.




SmartDots Workshops

A Workshop is a more intensive, often face-to-face (or synchronized virtual) meeting that usually follows an exchange.




Why Use Exchanges and Workshops as Benchmarks for AI Performance?

SmartDots exchanges and workshops provide a rigorous, standardized framework ideal for testing AI performance, because they offer a "Gold Standard" dataset. The biggest hurdle in AI development is often the lack of high-quality, labeled data. Because workshops are designed specifically to reach consensus on the most difficult samples, the resulting Modal Age serves as a peer-reviewed benchmark. An AI can be inserted into an exchange as if it were a human participant, allowing coordinators to directly compare the algorithm's accuracy, precision, and bias against a panel of international experts under identical conditions.

Furthermore, since exchanges involve samples from different institutes, regions, and fish stocks, they test the AI's robustness and generalization — crucial to ensuring an algorithm doesn't merely "memorize" one type of otolith but actually learns the biological features of growth rings. The statistical metrics already built into SmartDots — such as Coefficient of Variation (CV) and Percentage Agreement (PA) — provide an immediate, standard scorecard for the AI's reliability.

Using these events to test AI creates a clear pathway for operational integration. AI developers can identify exactly where an algorithm fails (e.g., misidentifying "false rings" or "edge growth"), and expert discussions become the specific training data used to improve the next model version — eventually leading to a fully calibrated and trusted automated system.




How to Integrate an AI Reader in SmartDots?

SmartDots utilizes a modular architecture where the software and database communicate via an API. This allows AI algorithms to interact with the system exactly like human users. To maintain data integrity, AI agents are assigned unique identifiers. While human readers are anonymized by country (e.g., DK01, DK02), algorithms are always identified as AI01, AI02, etc., ensuring humans and AI agents are never confused in datasets.

There are two primary ways for an AI algorithm to operate within SmartDots: with past events (retrospective) or with ongoing events (proactive).




Retrospective Benchmarking (Closed/Public Events)

To facilitate benchmarking AI algorithms against past events, the following has been prepared:

For accessing past events, no registration is required — the AI algorithm can only access public events.

Data Licensing and Access: No formal registration is required. Since 2026, all closed workshops and exchanges fall under the CC BY 4.0 license, meaning any AI can simply access these public data via the SmartDots API.

Workflow: The AI performs a "blind" upload of its predictions. SmartDots then automatically calculates a score based on the historical consensus. Since these events are closed, the Modal Age and Percent Agreement are locked, providing a "Gold Standard" for validation.

Retrospective AI Benchmarking for Closed Events

Once a SmartDots workshop or exchange is concluded and results are made public, the event serves as a Gold Standard dataset. The fixed consensus reached by human experts provides the perfect environment for testing and ranking AI algorithms.

1. The Workflow: Testing Against History




AI Performance Scoring Framework: SmartDots Integration

This framework evaluates an AI algorithm's ability to predict fish age by comparing its outputs against the Modal Age (the consensus reached by expert human readers). The system uses a weighted penalty logic that scales based on the number of samples in the event and the confidence level of the human experts.

This ranking system is unique because it penalizes the AI based on human certainty: if the experts all agree (high agreement) and the AI is wrong, the penalty is higher. If the experts are unsure (low agreement), the penalty is lower. Scores are relative to dataset size, ensuring a small event and a large event are both graded on a 0–100 scale.

1. The Scoring Philosophy

Performance is measured on a scale from 0 to 100, where 100 represents a perfect match with the human expert consensus.

2. The Penalty Formula

For every sample where the AI's predicted age does not match the Modal Age, a penalty is deducted from the total score.

Variables:

Penalty per Sample = M × D × C × V

3. Final Score Calculation

Final Score = 100 − Σ Penalties

4. Scoring Examples (Based on 10 Fish Samples)

Each sample is worth 10 points (100 / 10 = 10). Modal Age for the fish sample is 3.

AI Guess Distance to Modal Age Percentage of Agreement Points Deducted
30100%0.25 × 10 × 1 × 0 = 0
4150%0.25 × 1 × 0.5 × 10 = 2.5
5270%0.25 × 2 × 0.7 × 10 = 3.5
74 (Cap)100%0.25 × 4 × 1 × 10 = 10
84 (Cap)35%0.25 × 4 × 0.3 × 10 = 3.5

Reporting and Understanding the Percentage of Agreement (PA)

In addition to the final AI score, the report includes the Percentage of Agreement (PA), calculated based on the consensus among human readers for a specific age.

Example Scenario (ten human readers):

Why PA Matters for Benchmarking:



Reporting

To see an example of the benchmarking output, see Annex 1. An API call also exists for demonstration: it accepts an event name and reader name, then submits that reader's readings as if they were an AI algorithm:

https://smartdots.ices.dk/API/getScoringReaderInEvent?tblEventID=1842&reader=balliu

The output includes a graph showing the performance of the "AI algorithm" plotted alongside actual readers of the event:

https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=30




Proactive Participation (Ongoing Events)

Registered institutes can deploy algorithms to participate in live workshops alongside human experts.

1. Institutional AI Registration

Institutes can formally register their AI algorithms within SmartDots. Registration can only be completed by a country coordinator.

2. Live Participation Workflow

When a new exchange or workshop is created, the Event Coordinator has a checkbox to allow AI algorithms into the session. If an algorithm is registered for that species and area, it will be permitted to participate.

3. Integrated Reporting & Evaluation




AI Performance Metrics & Reporting Structure

When ages for an event are uploaded, the SmartDots platform compares them against the annotations made during the event, generating an AI Score and an Average Percentage of Agreement.

1. Main Header (Root) Report

FieldDescription
SuccessIndicates the event was successfully located and the report generated.
Number_samples_in_eventTotal number of distinct fish present in the event.
each_sample_valueThe weighted value assigned to each individual sample: 100 / Number_samples_in_event
AI_ScoreThe calculated performance score for the AI.
Readers_Average_ScoreThe average calculated performance of all readers.
AI_Average_Percentage_AgreementThe overall average agreement across all samples.
all_Readers_average_percentage_agreement_for_modal_ageThe average percentage of agreement between readers for all FishIDs.
number_of_samples_reported_by_AIThe count of samples where the AI successfully provided an age.
Invalid_Fish_Sent_By_AIA list of any fish IDs submitted with errors or invalid data.
messageStatus confirmation (e.g., "Report processed and scored successfully").
see_performance_...URL link to visual performance graphs comparing AI against human readers.

Example JSON Response

{
  "success": true,
  "number_samples_in_event": 162,
  "each_sample_value": 0.617283950617284,
  "aI_finalScore": 97.479383,
  "aI_AveragePercentageAgreement": 82.7159259259259,
  "all_fishID_average_percentage_agreement_for_modal_age": 85.0822222222222,
  "number_of_samples_reported_by_AI": 160,
  "invalid_Fish_Sent_By_Ai": [],
  "message": "Report processed and scored successfully.",
  "see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34"
}

This is an example graph for event 1842, where human readers are the blue bars and the AI algorithm is the green one:

AI vs Human readers scoring graph for event 1842
Performance graph: human readers (blue) vs. AI algorithm (green) for event 1842

By Modal Age Reporting

FieldDescription
tblEventIDThe event identifier.
modalAgeThe modal age of this record.
average_Percentage_agreement_between_readersThe average percentage of agreement between readers for that modal age.
number_samples_with_that_modal_ageNumber of samples with that modal age.
number_samples_read_AINumber of samples that the AI read in that modal age group.
average_percentage_agreement_for_AI_for_modal_ageThe average percentage of agreement for the AI for that modal age.

Individual Fish Reporting

A detailed record is generated for every fish in the event.

FieldDescription
FishIDThe unique identifier for the fish sample.
StatusReported, Missing, or Not Found.
AI_reported_AgeThe age reported by the AI for this fish.
Readers_Modal_AgeThe modal age according to the readers of the event.
Percentage_of_agreement_modalAgeThe percentage of agreement for the modal age.
Percentage_of_agreement_reported_ageThe percentage of agreement for the AI-reported age.
ScoreThe score for that individual fish.



Conclusion

The integration of AI into SmartDots represents a significant evolution in biological data processing within the ICES community. This roadmap achieves three critical goals:

As this roadmap progresses over the next two years, the continuous feedback loop between institutional AI models and SmartDots' reporting tools will pave the way for more efficient, consistent, and scalable biological readings across all member institutes.





Annex 1 — Example Output Report of an AI Age Reading
{
  "success": true,
  "number_samples_in_event": 162,
  "each_sample_value": 0.617283950617284,
  "aI_Score": 97.2,
  "readers_Average_Score": 97.3,
  "aI_Average_Percentage_Agreement": 82.7159259259259,
  "all_Readers_average_percentage_agreement_for_modal_age": 85.0822222222222,
  "number_of_samples_reported_by_AI": 160,
  "invalid_Fish_Sent_By_Ai": [],
  "message": "Report processed and scored successfully.",
  "see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34",
  "performance_AI_Per_Modal_Age": [
    { "tblEventID": 1842, "modalAge": 0,  "average_Percentage_agreement_between_readers": 55.56,  "number_samples_with_that_modal_age": 3,  "number_samples_read_AI": 1,  "average_pergentage_agreement_for_AI_for_modal_age": 16.67 },
    { "tblEventID": 1842, "modalAge": 1,  "average_Percentage_agreement_between_readers": 92.86,  "number_samples_with_that_modal_age": 7,  "number_samples_read_AI": 8,  "average_pergentage_agreement_for_AI_for_modal_age": 87.5 },
    { "tblEventID": 1842, "modalAge": 2,  "average_Percentage_agreement_between_readers": 97.92,  "number_samples_with_that_modal_age": 8,  "number_samples_read_AI": 8,  "average_pergentage_agreement_for_AI_for_modal_age": 97.92 },
    { "tblEventID": 1842, "modalAge": 3,  "average_Percentage_agreement_between_readers": 91.67,  "number_samples_with_that_modal_age": 16, "number_samples_read_AI": 17, "average_pergentage_agreement_for_AI_for_modal_age": 89.22 },
    { "tblEventID": 1842, "modalAge": 4,  "average_Percentage_agreement_between_readers": 83.93,  "number_samples_with_that_modal_age": 28, "number_samples_read_AI": 25, "average_pergentage_agreement_for_AI_for_modal_age": 88.0 },
    { "tblEventID": 1842, "modalAge": 5,  "average_Percentage_agreement_between_readers": 88.51,  "number_samples_with_that_modal_age": 29, "number_samples_read_AI": 31, "average_pergentage_agreement_for_AI_for_modal_age": 86.02 },
    { "tblEventID": 1842, "modalAge": 6,  "average_Percentage_agreement_between_readers": 87.78,  "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 15, "average_pergentage_agreement_for_AI_for_modal_age": 87.78 },
    { "tblEventID": 1842, "modalAge": 7,  "average_Percentage_agreement_between_readers": 86.36,  "number_samples_with_that_modal_age": 11, "number_samples_read_AI": 11, "average_pergentage_agreement_for_AI_for_modal_age": 86.36 },
    { "tblEventID": 1842, "modalAge": 8,  "average_Percentage_agreement_between_readers": 77.78,  "number_samples_with_that_modal_age": 12, "number_samples_read_AI": 12, "average_pergentage_agreement_for_AI_for_modal_age": 77.78 },
    { "tblEventID": 1842, "modalAge": 9,  "average_Percentage_agreement_between_readers": 88.10,  "number_samples_with_that_modal_age": 7,  "number_samples_read_AI": 7,  "average_pergentage_agreement_for_AI_for_modal_age": 88.10 },
    { "tblEventID": 1842, "modalAge": 10, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 6,  "number_samples_read_AI": 7,  "average_pergentage_agreement_for_AI_for_modal_age": 78.57 },
    { "tblEventID": 1842, "modalAge": 11, "average_Percentage_agreement_between_readers": 71.11,  "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 14, "average_pergentage_agreement_for_AI_for_modal_age": 72.62 },
    { "tblEventID": 1842, "modalAge": 12, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 4,  "number_samples_read_AI": 4,  "average_pergentage_agreement_for_AI_for_modal_age": 83.33 },
    { "tblEventID": 1842, "modalAge": 14, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 1,  "number_samples_read_AI": 0,  "average_pergentage_agreement_for_AI_for_modal_age": 0 }
  ],
  "detailed_scores_per_fishID": [ "... (162 records) ..." ]
}



Annex 2 — Example API Call to Report AI Readings

This is an example of the JSON body for the POST request to http://smartdots.ices.dk/API/submitFishAges:

{
  "email": "researcher@example.com",
  "tblEventID": 1842,
  "institute": "Test",
  "comments": "this is a test",
  "nameAIAlgorithm": "TestAI",
  "version": "0.02",
  "reports": [
    { "fishID": "FISH-99A",  "age": 4 },
    { "fishID": "FISH-102B", "age": 7 },
    { "fishID": "FISH-205C", "age": 2 }
  ]
}