ICES - SmartDots

Print it
Send to
Share it

About this page

This document describes how AI algorithms can be integrated and tested within the SmartDots platform, the ICES international tool for standardizing fish age determination. It covers two integration modes: retrospective benchmarking against finalized expert events (used as Gold Standard datasets), and proactive participation in live workshops alongside human readers. The document also details the scoring framework used to evaluate AI performance, including the penalty formula, reporting structure, and API endpoints for submitting and retrieving results.

⬇ Download this document (.docx)

Roadmap for AI Integration in SmartDots Platform

(Leveraging SmartDots Frameworks for the Validation of AI Ageing Algorithms)

SmartDots is the international platform designed to standardize the age determination of fish, currently it also supports exchanges, workshops and training events of maturity determination, larvae and eggs identification. But in this AI integration we will focus in the Age reading events, the age reading involves most of the times the counting the "annuli" (annual growth rings) in calcified structures, most commonly otoliths (ear stones), even do there are species where other structures are used like scales or vertebras.

Because interpreting these rings is subjective, Exchanges and Workshops are essential for quality control.

SmartDots Exchanges

An Exchange is a remote, large-scale event where multiple researchers from different countries read the same set of fish images independently.

How it works: A coordinator uploads a set of otolith images to SmartDots. Readers from various institutes log in and annotate the images (placing "dots" on the rings they identify).
Purpose: To assess the precision and bias between different labs or readers.
Outcome: If most readers agree on the age, a Modal Age (the most frequent age assigned) is established. Significant disagreement signals that the stock may be difficult to read or that different labs are applying different criteria.

SmartDots Workshops

A Workshop is a more intensive, often face-to-face (or synchronized virtual) meeting that usually follows an exchange.

How it works: Experts meet to discuss the results of a previous exchange, examining images where readers disagreed and working to reach consensus on why a specific ring was counted or ignored (e.g., distinguishing "false rings" from "true winter rings").
Purpose: To calibrate readers and harmonize reading rules across all international labs.
Outcome: Updated age-reading manuals and a "Reference Collection" of images with known or agreed-upon ages.

Why Use Exchanges and Workshops as Benchmarks for AI Performance?

SmartDots exchanges and workshops provide a rigorous, standardized framework ideal for testing AI performance, because they offer a "Gold Standard" dataset. The biggest hurdle in AI development is often the lack of high-quality, labeled data. Because workshops are designed specifically to reach consensus on the most difficult samples, the resulting Modal Age serves as a peer-reviewed benchmark. An AI can be inserted into an exchange as if it were a human participant, allowing coordinators to directly compare the algorithm's accuracy, precision, and bias against a panel of international experts under identical conditions.

Furthermore, since exchanges involve samples from different institutes, regions, and fish stocks, they test the AI's robustness and generalization — crucial to ensuring an algorithm doesn't merely "memorize" one type of otolith but actually learns the biological features of growth rings. The statistical metrics already built into SmartDots — such as Coefficient of Variation (CV) and Percentage Agreement (PA) — provide an immediate, standard scorecard for the AI's reliability.

Using these events to test AI creates a clear pathway for operational integration. AI developers can identify exactly where an algorithm fails (e.g., misidentifying "false rings" or "edge growth"), and expert discussions become the specific training data used to improve the next model version — eventually leading to a fully calibrated and trusted automated system.

How to Integrate an AI Reader in SmartDots?

SmartDots utilizes a modular architecture where the software and database communicate via an API. This allows AI algorithms to interact with the system exactly like human users. To maintain data integrity, AI agents are assigned unique identifiers. While human readers are anonymized by country (e.g., DK01, DK02), algorithms are always identified as AI01, AI02, etc., ensuring humans and AI agents are never confused in datasets.

There are two primary ways for an AI algorithm to operate within SmartDots: with past events (retrospective) or with ongoing events (proactive).

Retrospective Benchmarking (Closed/Public Events)

To facilitate benchmarking AI algorithms against past events, the following has been prepared:

This technical guide for AI testing.
API calls for:
- Retrieving a list of all public events: https://smartdots.ices.dk/API/getListEvents
- Accessing sample metadata and images for a specific event: https://smartdots.ices.dk/API/getListSamples?tblEventID=74
- Uploading age "hints" and receiving an automated performance score: http://smartdots.ices.dk/API/submitFishAges (POST request)

For accessing past events, no registration is required — the AI algorithm can only access public events.

Data Licensing and Access: No formal registration is required. Since 2026, all closed workshops and exchanges fall under the CC BY 4.0 license, meaning any AI can simply access these public data via the SmartDots API.

Workflow: The AI performs a "blind" upload of its predictions. SmartDots then automatically calculates a score based on the historical consensus. Since these events are closed, the Modal Age and Percent Agreement are locked, providing a "Gold Standard" for validation.

Retrospective AI Benchmarking for Closed Events

Once a SmartDots workshop or exchange is concluded and results are made public, the event serves as a Gold Standard dataset. The fixed consensus reached by human experts provides the perfect environment for testing and ranking AI algorithms.

1. The Workflow: Testing Against History

Accessing Closed Events: Developers access finalized events where the Modal Age and Percent Agreement are already locked and immutable.
The "Blind" Upload: The AI algorithm uploads its age predictions ("hints") for all samples without knowing the human results beforehand.
Automated Ranking: The SmartDots system compares these AI predictions against the archived expert results to generate the final score.
Percentage of Agreement: The benchmark report also includes the average percentage of agreement that the algorithm achieved for that event, based on the percentage of agreement the reported age received during the event.

AI Performance Scoring Framework: SmartDots Integration

This framework evaluates an AI algorithm's ability to predict fish age by comparing its outputs against the Modal Age (the consensus reached by expert human readers). The system uses a weighted penalty logic that scales based on the number of samples in the event and the confidence level of the human experts.

This ranking system is unique because it penalizes the AI based on human certainty: if the experts all agree (high agreement) and the AI is wrong, the penalty is higher. If the experts are unsure (low agreement), the penalty is lower. Scores are relative to dataset size, ensuring a small event and a large event are both graded on a 0–100 scale.

1. The Scoring Philosophy

Performance is measured on a scale from 0 to 100, where 100 represents a perfect match with the human expert consensus.

Weighted Samples: Every sample is worth an equal portion of the 100-point total:
Weight per fish sample = 100 / Total number of fish in the event
Example: In a 20-sample event, each fish is worth 5 points. In a 400-sample event, each fish is worth 0.25 points.
Confidence-Based Penalties: If the AI prediction is incorrect, points are deducted proportionally to how confident the human consensus was.

2. The Penalty Formula

For every sample where the AI's predicted age does not match the Modal Age, a penalty is deducted from the total score.

Variables:

Distance (D): The absolute difference between the AI guess and the Modal Age. Capped at a maximum of 4.
Confidence (C): The Percent Agreement (PA) of the human readers for that sample, expressed as a decimal (e.g., 90% = 0.9).
Sample Value (V): The points assigned to each fish (100 / Total Samples).
Multiplier (M): Set at 0.25. If the AI is wrong by 4 and the PA is 100%, all points for that sample are removed (4 × 0.25 × 1 = 1).

Penalty per Sample = M × D × C × V

3. Final Score Calculation

Maximum Score: 100 (AI matched every Modal Age perfectly).
Minimum Score: 0 (cannot go below zero).

Final Score = 100 − Σ Penalties

4. Scoring Examples (Based on 10 Fish Samples)

Each sample is worth 10 points (100 / 10 = 10). Modal Age for the fish sample is 3.

AI Guess	Distance to Modal Age	Percentage of Agreement	Points Deducted
3	0	100%	0.25 × 10 × 1 × 0 = 0
4	1	50%	0.25 × 1 × 0.5 × 10 = 2.5
5	2	70%	0.25 × 2 × 0.7 × 10 = 3.5
7	4 (Cap)	100%	0.25 × 4 × 1 × 10 = 10
8	4 (Cap)	35%	0.25 × 4 × 0.3 × 10 = 3.5

Reporting and Understanding the Percentage of Agreement (PA)

In addition to the final AI score, the report includes the Percentage of Agreement (PA), calculated based on the consensus among human readers for a specific age.

Example Scenario (ten human readers):

5 readers classify the age as 5 (50% agreement)
3 readers classify the age as 6 (30% agreement)
2 readers classify the age as 4 (20% agreement)

Why PA Matters for Benchmarking:

High Consensus (>90% PA): These are "high-quality" events where most readers agree — ideal datasets for validating AI.
Low Consensus (~50% PA): These events indicate significant disagreement. It is generally advisable to avoid testing AI algorithms on such events, as the "ground truth" is inherently unstable.

Reporting

To see an example of the benchmarking output, see Annex 1. An API call also exists for demonstration: it accepts an event name and reader name, then submits that reader's readings as if they were an AI algorithm:

https://smartdots.ices.dk/API/getScoringReaderInEvent?tblEventID=1842&reader=balliu

The output includes a graph showing the performance of the "AI algorithm" plotted alongside actual readers of the event:

https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=30

Proactive Participation (Ongoing Events)

Registered institutes can deploy algorithms to participate in live workshops alongside human experts.

Workflow: The Event Coordinator has an option to allow registered AI algorithms to annotate in an event. The AI will be called on the day the event opens to register its readings.
Blind Protocol: AI readings are treated like any other readings and remain hidden until the event is closed. Only the event organizer and delegates can see the AI readings before the event closes.
Reporting: The event coordinator can produce a report and evaluate AI performance using the method of their choice, and can even opt to leave the AI readings out of the report.

1. Institutional AI Registration

Institutes can formally register their AI algorithms within SmartDots. Registration can only be completed by a country coordinator.

Registration Profile: Each algorithm is linked to a country coordinator and includes metadata such as the algorithm name and the specific species it is designed for (e.g., Cod or Herring).

2. Live Participation Workflow

When a new exchange or workshop is created, the Event Coordinator has a checkbox to allow AI algorithms into the session. If an algorithm is registered for that species and area, it will be permitted to participate.

Reading: The AI performs its readings on the first day of the event and must submit annotations and age guesses before the end of that day.
The "Blind" Protocol: AI readings can be kept hidden from human readers until the end of the event to prevent any influence on the human blind-reading process.

3. Integrated Reporting & Evaluation

Dynamic Reporting: The organizer can generate reports that include or exclude AI data, enabling human-vs-AI comparison reports or pure "Human Gold Standard" reports.
Organizer Responsibility: The criteria for evaluating AI performance and whether its readings should be considered valid for the final consensus reside solely with the Event Organizer.
Custom Success Metrics: While the platform provides the standard penalty-based score, the organizer can apply their own specific thresholds or statistical filters.

AI Performance Metrics & Reporting Structure

When ages for an event are uploaded, the SmartDots platform compares them against the annotations made during the event, generating an AI Score and an Average Percentage of Agreement.

1. Main Header (Root) Report

Field	Description
Success	Indicates the event was successfully located and the report generated.
Number_samples_in_event	Total number of distinct fish present in the event.
each_sample_value	The weighted value assigned to each individual sample: 100 / Number_samples_in_event
AI_Score	The calculated performance score for the AI.
Readers_Average_Score	The average calculated performance of all readers.
AI_Average_Percentage_Agreement	The overall average agreement across all samples.
all_Readers_average_percentage_agreement_for_modal_age	The average percentage of agreement between readers for all FishIDs.
number_of_samples_reported_by_AI	The count of samples where the AI successfully provided an age.
Invalid_Fish_Sent_By_AI	A list of any fish IDs submitted with errors or invalid data.
message	Status confirmation (e.g., "Report processed and scored successfully").
see_performance_...	URL link to visual performance graphs comparing AI against human readers.

Example JSON Response

{
  "success": true,
  "number_samples_in_event": 162,
  "each_sample_value": 0.617283950617284,
  "aI_finalScore": 97.479383,
  "aI_AveragePercentageAgreement": 82.7159259259259,
  "all_fishID_average_percentage_agreement_for_modal_age": 85.0822222222222,
  "number_of_samples_reported_by_AI": 160,
  "invalid_Fish_Sent_By_Ai": [],
  "message": "Report processed and scored successfully.",
  "see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34"
}

This is an example graph for event 1842, where human readers are the blue bars and the AI algorithm is the green one:

AI vs Human readers scoring graph for event 1842 — Performance graph: human readers (blue) vs. AI algorithm (green) for event 1842

By Modal Age Reporting

Field	Description
tblEventID	The event identifier.
modalAge	The modal age of this record.
average_Percentage_agreement_between_readers	The average percentage of agreement between readers for that modal age.
number_samples_with_that_modal_age	Number of samples with that modal age.
number_samples_read_AI	Number of samples that the AI read in that modal age group.
average_percentage_agreement_for_AI_for_modal_age	The average percentage of agreement for the AI for that modal age.

Individual Fish Reporting

A detailed record is generated for every fish in the event.

Field	Description
FishID	The unique identifier for the fish sample.
Status	Reported, Missing, or Not Found.
AI_reported_Age	The age reported by the AI for this fish.
Readers_Modal_Age	The modal age according to the readers of the event.
Percentage_of_agreement_modalAge	The percentage of agreement for the modal age.
Percentage_of_agreement_reported_age	The percentage of agreement for the AI-reported age.
Score	The score for that individual fish.

Conclusion

The integration of AI into SmartDots represents a significant evolution in biological data processing within the ICES community. This roadmap achieves three critical goals:

Standardized Benchmarking: By utilizing closed "Gold Standard" events, SmartDots provides a transparent and objective environment for developers to refine their algorithms against verified human consensus.
Collaborative Innovation: The real-time participation workflow allows AI to function as a "digital peer," providing event organizers with supplementary insights without compromising the integrity of the human blind-reading protocol.
Governance and Flexibility: The framework maintains a "human-in-the-loop" philosophy. While AI provides the data, the ultimate responsibility for evaluation, reporting, and final consensus remains with the Event Organizers and Coordinators.

As this roadmap progresses over the next two years, the continuous feedback loop between institutional AI models and SmartDots' reporting tools will pave the way for more efficient, consistent, and scalable biological readings across all member institutes.

Annex 1 — Example Output Report of an AI Age Reading

{
  "success": true,
  "number_samples_in_event": 162,
  "each_sample_value": 0.617283950617284,
  "aI_Score": 97.2,
  "readers_Average_Score": 97.3,
  "aI_Average_Percentage_Agreement": 82.7159259259259,
  "all_Readers_average_percentage_agreement_for_modal_age": 85.0822222222222,
  "number_of_samples_reported_by_AI": 160,
  "invalid_Fish_Sent_By_Ai": [],
  "message": "Report processed and scored successfully.",
  "see_performance_algorithm_against_readers": "https://smartdots.ices.dk/API/GetScoringInEventGraphs?tblEventID=1842&AIID=34",
  "performance_AI_Per_Modal_Age": [
    { "tblEventID": 1842, "modalAge": 0,  "average_Percentage_agreement_between_readers": 55.56,  "number_samples_with_that_modal_age": 3,  "number_samples_read_AI": 1,  "average_pergentage_agreement_for_AI_for_modal_age": 16.67 },
    { "tblEventID": 1842, "modalAge": 1,  "average_Percentage_agreement_between_readers": 92.86,  "number_samples_with_that_modal_age": 7,  "number_samples_read_AI": 8,  "average_pergentage_agreement_for_AI_for_modal_age": 87.5 },
    { "tblEventID": 1842, "modalAge": 2,  "average_Percentage_agreement_between_readers": 97.92,  "number_samples_with_that_modal_age": 8,  "number_samples_read_AI": 8,  "average_pergentage_agreement_for_AI_for_modal_age": 97.92 },
    { "tblEventID": 1842, "modalAge": 3,  "average_Percentage_agreement_between_readers": 91.67,  "number_samples_with_that_modal_age": 16, "number_samples_read_AI": 17, "average_pergentage_agreement_for_AI_for_modal_age": 89.22 },
    { "tblEventID": 1842, "modalAge": 4,  "average_Percentage_agreement_between_readers": 83.93,  "number_samples_with_that_modal_age": 28, "number_samples_read_AI": 25, "average_pergentage_agreement_for_AI_for_modal_age": 88.0 },
    { "tblEventID": 1842, "modalAge": 5,  "average_Percentage_agreement_between_readers": 88.51,  "number_samples_with_that_modal_age": 29, "number_samples_read_AI": 31, "average_pergentage_agreement_for_AI_for_modal_age": 86.02 },
    { "tblEventID": 1842, "modalAge": 6,  "average_Percentage_agreement_between_readers": 87.78,  "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 15, "average_pergentage_agreement_for_AI_for_modal_age": 87.78 },
    { "tblEventID": 1842, "modalAge": 7,  "average_Percentage_agreement_between_readers": 86.36,  "number_samples_with_that_modal_age": 11, "number_samples_read_AI": 11, "average_pergentage_agreement_for_AI_for_modal_age": 86.36 },
    { "tblEventID": 1842, "modalAge": 8,  "average_Percentage_agreement_between_readers": 77.78,  "number_samples_with_that_modal_age": 12, "number_samples_read_AI": 12, "average_pergentage_agreement_for_AI_for_modal_age": 77.78 },
    { "tblEventID": 1842, "modalAge": 9,  "average_Percentage_agreement_between_readers": 88.10,  "number_samples_with_that_modal_age": 7,  "number_samples_read_AI": 7,  "average_pergentage_agreement_for_AI_for_modal_age": 88.10 },
    { "tblEventID": 1842, "modalAge": 10, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 6,  "number_samples_read_AI": 7,  "average_pergentage_agreement_for_AI_for_modal_age": 78.57 },
    { "tblEventID": 1842, "modalAge": 11, "average_Percentage_agreement_between_readers": 71.11,  "number_samples_with_that_modal_age": 15, "number_samples_read_AI": 14, "average_pergentage_agreement_for_AI_for_modal_age": 72.62 },
    { "tblEventID": 1842, "modalAge": 12, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 4,  "number_samples_read_AI": 4,  "average_pergentage_agreement_for_AI_for_modal_age": 83.33 },
    { "tblEventID": 1842, "modalAge": 14, "average_Percentage_agreement_between_readers": 83.33,  "number_samples_with_that_modal_age": 1,  "number_samples_read_AI": 0,  "average_pergentage_agreement_for_AI_for_modal_age": 0 }
  ],
  "detailed_scores_per_fishID": [ "... (162 records) ..." ]
}

Annex 2 — Example API Call to Report AI Readings

This is an example of the JSON body for the POST request to http://smartdots.ices.dk/API/submitFishAges:

{
  "email": "researcher@example.com",
  "tblEventID": 1842,
  "institute": "Test",
  "comments": "this is a test",
  "nameAIAlgorithm": "TestAI",
  "version": "0.02",
  "reports": [
    { "fishID": "FISH-99A",  "age": 4 },
    { "fishID": "FISH-102B", "age": 7 },
    { "fishID": "FISH-205C", "age": 2 }
  ]
}