Wikimedia
sewikimedia
https://se.wikimedia.org/wiki/Huvudsida
MediaWiki 1.46.0-wmf.26
first-letter
Media
Special
Diskussion
Användare
Användardiskussion
Wikimedia
Wikimediadiskussion
Fil
Fildiskussion
MediaWiki
MediaWiki-diskussion
Mall
Malldiskussion
Hjälp
Hjälpdiskussion
Kategori
Kategoridiskussion
Tråd
Tråddiskussion
Summering
Summeringsdiskussion
Projekt
Projektdiskussion
TimedText
TimedText talk
Modul
Moduldiskussion
Translations
Translations talk
Ämne
Projekt:CommonsDB registry/Implementation
100
28508
135836
135834
2026-05-02T13:47:27Z
Ainali
5
/* Selecting license templates */ link to query
135836
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitexh:Event Platform/EventStreams HTTP Service|EventStreams]] on the mediawiki.page-delete stream and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template.
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload.
=== Prototyping options ===
A local script that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
5vfofhrdjak69zg9pgeu212r7wn0auz
135837
135836
2026-05-02T15:24:15Z
Ainali
5
/* Find duplicates among existing images */ what can be skipped
135837
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitexh:Event Platform/EventStreams HTTP Service|EventStreams]] on the mediawiki.page-delete stream and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template. Also, files linking to each other using the Other versions field can be excluded form the backlog automatically as they have already been properly dealt with.
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload.
=== Prototyping options ===
A local script that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
227jfu5mcwd3uhi7qfb0u6ycdhs2v89
135838
135837
2026-05-02T15:30:09Z
Ainali
5
/* Find duplicates when preparing a batch upload */ Adding in user request from the hackathon
135838
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitexh:Event Platform/EventStreams HTTP Service|EventStreams]] on the mediawiki.page-delete stream and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template. Also, files linking to each other using the Other versions field can be excluded form the backlog automatically as they have already been properly dealt with.
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload. This may be helpful for regular users too, to avoid uploading an entire folder where there are some "burst shots" where a manual selection would be better.
=== Prototyping options ===
A local script (or app) that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Integrating in external tool workflows ==
For small GLAMs, adding requirements of extra checks that has to be done can feel burdensome. Therefore, it would be great if existing tools could be enhanced to help out. For example. could there be an extension to OpenRefine that for images in a loaded project both calculated the ISCC (which then could be submitted as structured data) and compared them to each other.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
a34472pbtcocipzxgiih56jyn3mb4tr
135839
135838
2026-05-02T15:36:14Z
Ainali
5
/* Removing(?) declarations of deleted files from the registry */ typo
135839
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitech:Event Platform/EventStreams HTTP Service|EventStreams]] on the mediawiki.page-delete stream and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template. Also, files linking to each other using the Other versions field can be excluded form the backlog automatically as they have already been properly dealt with.
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload. This may be helpful for regular users too, to avoid uploading an entire folder where there are some "burst shots" where a manual selection would be better.
=== Prototyping options ===
A local script (or app) that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Integrating in external tool workflows ==
For small GLAMs, adding requirements of extra checks that has to be done can feel burdensome. Therefore, it would be great if existing tools could be enhanced to help out. For example. could there be an extension to OpenRefine that for images in a loaded project both calculated the ISCC (which then could be submitted as structured data) and compared them to each other.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
0qol9c8bsnmi55hhvqf11o4pawo38d5
135840
135839
2026-05-02T15:38:55Z
Ainali
5
/* Find duplicates among existing images */ clarification
135840
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitech:Event Platform/EventStreams HTTP Service|EventStreams]] on the mediawiki.page-delete stream and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template. Then, files linking to each other using the Other versions field can be excluded from being marked as a duplicate automatically as they have already been properly dealt with. (Obviously, we can find a further versions which would then would need to be dealt with.)
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload. This may be helpful for regular users too, to avoid uploading an entire folder where there are some "burst shots" where a manual selection would be better.
=== Prototyping options ===
A local script (or app) that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Integrating in external tool workflows ==
For small GLAMs, adding requirements of extra checks that has to be done can feel burdensome. Therefore, it would be great if existing tools could be enhanced to help out. For example. could there be an extension to OpenRefine that for images in a loaded project both calculated the ISCC (which then could be submitted as structured data) and compared them to each other.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
o1duw36tl6p8j08guiwb0d0eizgur3x
135842
135840
2026-05-03T10:29:30Z
Ainali
5
/* Removing(?) declarations of deleted files from the registry */ link to the right stream
135842
wikitext
text/x-wiki
[[File:CommonsDB introduction.webm|thumb|200px|Explainer video of CommonsDB.]]
On this page we collect ideas, thoughts and considerations on how [[m:CommonsDB/Potential for the communities|the potential]] could be realized and what requirements that will have on the Wikimedia infrastructure. Some of the ideas are visualized on a high level in the video to the right. For now, all info goes on this page, but eventually they may need separate subpages.
== Checking if a new upload already is in CommonsDB ==
Note that the ways this could be implemented may vary depending on with which method an image is being uploaded. The UploadWizard may be the main way to implement it, as most of the other tools have either very experienced users (like OpenRefine users) with better than average knowledge about copyright, or (like the Wikimedia Commons app) mainly feature uploading original content. However, considering that there are many tools that can be used to upload to Wikimedia Commons, and that some tool developers wants to benefit from more features and make their uploads have as high quality as possible too, it might be worth considering the design of the feature to be generic and reusable through an API or library that can enable tools to use this functionality as a service.
=== UploadWizard ===
==== Check the CommonsDB register ====
# After media has been uploaded in the browser, an ISCC code can be generated for each of the submitted files.
# Use the ISCC for each of them to query the CommonsDB register to see if it is known.
## If assets with exact match or very high similarity is found, suggest a license template/statement based on the result (see below), possibly with a link so that the user can explore/verify.
### If the asset was supplied by Wikimedia, instead show a text like "There is already a similar file on Wikimedia Commons. Only upload your file if you are sure it is adding value." and link to the asset.
## If assets with high similarity is found, provide a link so that the user can explore/verify and if it was a match suggest a license template/statement based on the result.
## If it is not, do nothing.
All through this, log which message that was shown and what action user took (did they abort the upload or did they continue, and if so, with the license we showed or some else?).
==== Selecting license templates ====
When checking for similar matches as described above, if a match is found the rights statement of that asset can be used to select the right license on Wikimedia Commons. One way could be to match the string to an item on Wikidata with that value in official website (P856) and then from that item find topic has template (P1424) and get the Commons sitelink from that value, which should be the correct template. Of course, this need to be verified for all allowed rights statements in the registry, taking particular care of public domain which may be modeled differently on Commons. Possibly for performance reasons, we could make this a cached table and not query for this each time as the values are unlikely to change very often, if at all.
Example: If the CommonsDB register says that the image has the license:
* https://creativecommons.org/licenses/by-sa/2.5/it/
we can query the Wikidata API with the a question like this:
* https://www.wikidata.org/w/api.php?action=query&format=json&generator=search&formatversion=2&gsrsearch=haswbstatement%3AP856%3Dhttps%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-sa%2F2.5%2Fit%2F
This gives us the item Q98929925, from which we can again query which template this license is using through a query like:
* https://www.wikidata.org/w/api.php?action=wbgetclaims&format=json&entity=Q98929925&property=P1424&formatversion=2
This gives the item Q15304563 from which we can get the template name with:
* https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q15304563&languages=en&formatversion=2
which finds the name of the template on Commons in the sitelinks section, [[c:Template:Cc-by-sa-2.5-it|Template:Cc-by-sa-2.5-it]]. This can be suggested for the user as suitable license template. The last two bullet points can also be done by a single PSARQL query, provided that all items in step one is used as values for it, like this: https://query-chest.toolforge.org/redirect/WzZyyJjus4wYSo2IUm0UMAMKY4AoqwOs6oE0o0uAWeg
Note that for images in the public domain we will need the PD Rationale to pick a good license on Commons as they have so many public domain templates to choose from. If that is not available, I think our best action is just to say that CommonsDB thinks this is in the public domain and leave it up to the user to pick the correct template.
Note that these values are unlikely to change very often, so instead of doing these API calls every time, it might be better to make a static lookup table where we get the template text for each canonical license URL. A table for this is at [[Projekt:CommonsDB registry/Implementation/License URL to license template|License URL to license template page]].
Since the license selection comes on the next page in the upload workflow, we likely need to carry the information over there, especially for the case when the user is uploading multiple images having different licenses in CommonsDB.
==== Store ISCC ====
The ISCC code should be stored as Structured Data: there is already a property for this, {{P|P13150}}. This should be possibly qualified with Wikimedia page-version URL (P7569) so that it is clear which uploaded file the ISCC is for.
=== Prototyping options ===
[[Fil:CommonsDB Wikimedia Nortwestern Europe hackathon - showcase video.webm|miniatyr|Demo of a user script generating an ISCC code and then checking the CommonsDB registry.]]
An alternate workflow where a user uploads to a tool on Toolforge could be created, where that tool generates the ISCC, does the check against the CommonsDB API and recommends a licensing template based on the result. Then it either makes the upload completely separately or forwards the information to the UploadWizard.
A bit more integrated workflow, would be to create a user script. This one could then "embed" itself in the regular workflow, and don't feel that different for the user. Possibly, parts of the work behind the scenes could still benefit of a tool on Toolforge to generate the ISCC.
The first option is likely easier to build, whereas the second one would be a more convincing demo for the end user. It might also make it more clear for WMF developers on exactly where in the upload process the tool could be integrated.
=== Final implementation ===
Since the search API for the CommonsDB registry needs an API key that shouldn't be shared, a gadget or user script is not really possible as the key would be visible to anyone. At the same time, it need to be investigated what requirements comes along due to the fact that this is a service that neither is in the Wikimedia ecosystem, nor open source. Likely, we need configuration for the user, possibly even being opt-in with a clear message about the nature of the API.
== Declaring new uploads to the registry ==
We probably want some kind of a "grace period" before we declare a new upload to the registry. This period will allow for poor uploads to be speedily deleted and for some of the most common bots to add metadata in structured form to the file (examples: [https://commons.wikimedia.org/w/index.php?title=File%3ACecropia_telenitida_448681395.jpg&diff=1101917792&oldid=1101059604 SchlurcherBot], [https://commons.wikimedia.org/w/index.php?title=File:Hamr%C3%A5ngefj%C3%A4rden_20250822_165319.jpg&diff=prev&oldid=1081728006 BotMultichillT]) The length of this period should be checked with the community and be long enough so that the bots would be likely to have added the metadata to it and therefore minimize the need for updating declarations.
After the period has passed an upload can be considered as stable enough and a declaration should be made.
=== Store declaration status ===
That a declaration has been made should be stored somewhere, possibly as Structured Data. Storing the Declaration-ID as a main statement will make it easy to query for, and also makes sense as it is a kind of external identifier. Optionally, there could also be a qualifier on either the ISCC statement or this, connecting the two to each other (so that it is clear that the ISCC gave cause to the declaration ID). Declaration-ID can be used to lookup the CommonsDB API returning a JSON file. Example: https://api.commonsdb.org/v1/metadata-pub/bbqjcaww2wlkm5lnswq6rmsejzxnscz3ynvyvly2o2pnanrjxng3yftpb
=== Prototyping options ===
This could be built as a prototype on Toolforge. It could be done in a couple of steps. The first step is to identify suitable images to declare, generate the ISCC and put them in a queue. The second step, which we can enable when we feel like the first step is stable enough, is to do the declaration. The third step is to store the declaration status and ISCC on toolforge in a way that can be seen and show a suggested edit to Wikimedia Commons. When the needed properties have been created, those edits could be made to the file.
== Updating metadata in the registry ==
Sometimes when the information on the file page or its structured data is updated, this is information that could be updated to the original declaration. In particular, changes of the license should be of important to update.
It seems nontrivial to detect changes like this and some service that looks for that sort of changes may be needed.
=== Prototyping options ===
Here we could experiment with a toolforge tool scanning the recent changes on Commons, trying to detect relevant changes (like removing a Creative Commons template and adding a public domain template) on already declared images. In a first step, just finding the relevant changes is the goal and make it possible/easy for us to see how well the detection works. Making a new declaration can be made either after human review, or when we are confident that the detection is good enough.
== Navigating to the registry ==
Files that have declaration ID or that have an ISCC code in the structured data could have a link to the registry to either see what is stored there or if someone else has declared the file.
=== Prototyping options ===
* [[c:User:Ainali/commonsdb-link.js|A user script t]]<nowiki/>hat checks if a file has an ISCC code and adds a link to CommonsDB explorer if it does. (Add by editing [[c:Special:MyPage/common.js|your common.js]] page like [https://commons.wikimedia.org/w/index.php?title=User:Ainali/common.js&diff=1151171313&oldid=1035432594 this].)
* Similarly, a user script could be created for the declaration ID once we have that property.
=== Final implementation ===
When there is enough files declared in the registry and their ISCC codes has been added as structured data, evolving the user script into a gadget is probably a good idea.
== Removing(?) declarations of deleted files from the registry ==
If a file gets removed from Wikimedia Commons, the registry should be updated or possibly have the entry removed.
Here, we probably need a tool that listen to the [[wikitech:Event Platform/EventStreams HTTP Service|EventStreams]] on the [https://stream.wikimedia.org/v2/ui/#/?streams=mediawiki.page-delete mediawiki.page-delete stream] and check every deleted page against our database of declared files and if we find a match, we should start the process of updating CommonsDB with this information. (That process is still to be decided, but likely a declaration with some specific value or status.)
== Find duplicates among existing images ==
This would, in theory, be a one-time job of checking all images on Commons with the property ISCC against each other to see if there are duplicates. (In theory because it is unlikely we ever will be fully caught up due to new uploads with alternative uploads not checking for duplicates with ISCC.) It is likely that we do want some images that are very similar to each other, whereas others may have been mistakes of different kinds. For the mistakes, the community can decide if they want to delete the "duplicate" images (perhaps including redirects). For the ones that should be kept, a way to mark this as intentional would be good. Possibly this could be done with the other versions field in the Information template. Then, files linking to each other using the Other versions field can be excluded from being marked as a duplicate automatically as they have already been properly dealt with. (Obviously, we can find a further versions which would then would need to be dealt with.)
=== Suggesting category improvements ===
This is a similar one-time job for all of the images that are kept. It is likely that similar images should have at least one category in common, so a tool to easily make these edits could be made. This is likely a good "game" where users just approve or reject suggestions. If the user decides there should be no category in common, this may be odd to store on the files, which also suggests that an external tool can be useful.
== Find duplicates when preparing a batch upload ==
When doing batch uploads from GLAM partners, it [[w:sv:Wikipedia:Bybrunnen#c-Salgo60-20260315142800-Alicia_Fagerving_(WMSE)-20260313120300|may help]] to generate the ISCC codes pre-upload and check the files to see if there are any potential duplicates to prevent them ''before'' the upload. This may be helpful for regular users too, to avoid uploading an entire folder where there are some "burst shots" where a manual selection would be better.
=== Prototyping options ===
A local script (or app) that generates ISCC codes for a set of images and then just checks the similarity between them. The script could order the most similar images for manual review.
== Integrating in external tool workflows ==
For small GLAMs, adding requirements of extra checks that has to be done can feel burdensome. Therefore, it would be great if existing tools could be enhanced to help out. For example. could there be an extension to OpenRefine that for images in a loaded project both calculated the ISCC (which then could be submitted as structured data) and compared them to each other.
== Find media that has entered the public domain since the upload ==
This is not really related to the ISCC as such, but as we have declared and signed the metadata, we should make efforts to have that data be accurate. In the best of worlds, the community is already on top of this and we would just follow the process for updating the metadata in the registry as described above. However, no tools that assist such workflows are known and may be helpful for the community. A small subset, where the author of the file has a Wikidata item, could be discovered through a SPARQL query, by checking if enough time passed since the author died. This can create a queue for human review as it is might not be possible to determine the exact legal conditions for each image as it varies so much per jurisdiction.
== Resolve externally found "conflicts" ==
Here there are two scenarios, an initial check and then a continuous queue.
=== Initial check ===
This would be a one-time job of checking all images on Commons against other files in the registry. Files with exact match or very high similarity and a mismatch with the rights statement and the license on the file should be put in a queue for human review. Likely some manual investigation is needed to understand why this conflict occurred. Then, either the license is changed, or it should be marked as having the right one, possibly triggering a notification to the other data provider.
=== Continuous queue ===
Whenever new files are declared in the registry by other data providers that have exact or high similarity with files on Commons but where the rights statement are not the same, we should have a way to receive that notification and investigate the file. This could be a VRT queue if it needs to be email, but it would be better with a public list onwiki that anyone could help check. For this a bot receiving the notification could trigger an edit to add something to that list.
[[Kategori:CommonsDB registry|Implementation]]
2nsl4zfzj0qr3worrfhjg0dql7sxzmc
Projekt:CommonsDB registry/Wikimedia Hackathon
100
28653
135841
135833
2026-05-03T09:30:19Z
Tommy Kronkvist
63
Ortografi och försvenskning.
135841
wikitext
text/x-wiki
[[mw:Wikimedia Hackathon|Wikimedia Hackathon]] 1–3 maj.
Jan deltar (anmälan att delta har gjorts).
== Syfte ==
* Socialisera CommonsDB
* Få idéer och inspel på konkret implementering
== Aktiviteter ==
* Användartester med prototyper
* Demo av CommonsDB Explorer
* Undersök om det finns smartare sätt att gå från licens till mall när den föreslås under uppladdning
* Undersök om det finns smartare sätt att gå från mall till pdRationale vid deklarering
* Undersök best practice om eventstreams (när vi vill följa förändringar i filer, både när metadata (främst strukturerat data) förändras och om filen raderas).
* Fråga om [https://commons.wikimedia.org/w/index.php?api=attribution.v0-beta&title=Special%3ARestSandbox#/default/get_pages__title__signals Attribution API]
== Material att ta med ==
* Dekaler
* Flyers/flygblad
== Se även ==
* [[Projekt:CommonsDB registry/Resebeslut Wikimedia Hackathon, Jan|Resebeslut]]
* [[Projekt:CommonsDB registry/Wikimedia Hackathon Northwestern Europe 2026|Wikimedia Northwestern Europe Hackathon]]
[[Kategori:CommonsDB registry|Wikimedia Hackathon]]
2nhytbsqau0ce8p8aw0jyuqk6znglbj