Talk:Wikicite/grant/Brazilian Laws: Modeling the Brazilian legislation in Wikidata
Add topic@Ederporto:. Thank you for this proposal document. I would like to ask you a few questions:
- As you are proposing to create a mass-upload of content to Wikidata, I think it is important that you can demonstrate some community consensus/support for this proposed import. At the moment you have listed three locations where you will notify the community - only one of which is on wikidata. I suggest you expand that.
- You note that "documenting in detail the process and methodology for future replicability", and a tool on toolforge to do this. Can you elaborate - here or on the proposal itself - about how replicable your model will be for other countries/legal-jurisdictions? The ability for your process and tools to be used by other people with other datasets completely independently from you is very important - it is the difference between this being a grant for 'we will upload Brazilian laws' to 'we will enable anyone to upload any laws'. This is a substantial difference in the 'value' of the project. I would appreciate more details about that aspect.
- Relatedly: I would appreciate one of the 'measures of success' being whether a non-brazilian legal context can also use your workflow/tool.
- The currently available locations for the dataset are: website of the Palácio do Planalto, and LEXML. Are these open access? And do you plan to scrape their data? What is the copyright/database-right situation in Brazil for the items (and the full text)? How do Brazilian legal professionals access the text of laws normally? For comparison, I'm Australian and I used to work for the website which publishes all Australian law: http://www.austlii.edu.au/ . Is there something like that?
- The budget includes $2k for a handout. What is that? In the context of this project it is taking 20% of the entire budget - that seems disproportionate. Can you explain a bit more?
Sincerely, LWyatt (WMF) (talk) 17:22, 29 September 2020 (UTC)
- Hi, @LWyatt (WMF):, here are my answers:
- 1. Thank you for your comments. I notified some communities (but will add more to the list) and will look up on their feedback.
- 2. I believe the modeling can have two main parts: One more general, with properties/qualifiers and schemas common to legislation items, and another with more specific properties/qualifiers/identifiers (mainly identifiers). What it is being proposed it is in fact a tool that fetches the metadata of the Brazilian legislation. I do not think one single script will, without constant and hard manteinance, be able to scrape and match properties of different datasets if they are not fully interoperable. But the idea is to develop a methodology (and document it) that others can follow and contribute to when creating similar projects for their countries/legal-jurisdictions.
- 3. Yes, they are. Laws texts aren't protected by copyrights in Brazil and the LexML follows the Senate license ('Art. 8, inc. IV' - The following are not protected by copyright as dealt with in this Law: (...) IV - the texts of treaties or conventions, laws, decrees, regulations, judicial decisions and other official acts; and [1]).
- 4. Maybe an English error on my part? The idea is a series of tutorials (onwiki and off-wiki; one example is the Wiki Edu's files on how to edit Wikipédia in several topics). The value is what we usually get for this type of activities.
- Let me know if you have any more questions. Good contributions, Ederporto (talk) 21:29, 30 September 2020 (UTC)
Tool?
[edit]Hi! I am excited about the opportunity of more laws on Wikidata, we really need that. But I don't understand what the tool that should be used 100 times will do. You mention quickstatements which I assume is what will be ued for the actual import. So this tool is a formatter of sorts? How likely is it then that 100 other people will have sources of data where your tool is of use? Is the metadata about the laws in a common metadata format/standard? --♥Ainali talkcontributions 17:35, 30 September 2020 (UTC)
- @Ainali: Thank you for your comment. About the tool, what I imagine for it is a formatter for the Brazilian websites cited (Palácio do Planalto and LexML) to wikidatify their content and present to the user commands of QS to upload them. I think this could be the a starting point for other countries/legislations to do the same (maybe in the same tool). Good contributions, Ederporto (talk) 21:42, 30 September 2020 (UTC)
- Ok, but I have a bit of a hard time understand how you calculated the scale. You have two sites, and I imagine any tool that are more specialized than a spreadsheet (which is already fairly easy to use for converting to QS) will be very hard to use on websites that are not formatted exactly the same as those two (unless they are fully automatic and can adapt to a site intelligently, but that seems like a much larger job than 240 hours). Are there 98 more sites in the world following the same standard? ♥Ainali talkcontributions 22:13, 30 September 2020 (UTC)
- @Ainali: I'm referring to the following statement One tool to import the law data into the Wikimedia projects (At least 100 successful uses). In this statement, I'm referring to the statement right above Import all Brazilian legislation available at Palácio do Planalto website into Wikidata (+-28K items), meaning one tool to import the data from Palácio do Planalto (and LeXML as they refer to the same object) with at least 100 successful uses, that is, 100 items about Brazilian law created using the tool. Good contributions, Ederporto (talk) 20:02, 2 October 2020 (UTC)
- Alright, that makes more sense in one way. But building a tool to create 100 items seems like a lot of overhead. Even if you spent one hour per item (which is quite generous) you would be done in less than half the time allotted for this project if you did it manually. ♥Ainali talkcontributions 20:33, 2 October 2020 (UTC)
- @Ainali: The tool will not be created just for the 100 items, that's just the measure of success: Someone(s), that do not know how the tool works or were involved, be able to successfully use it 100 times. It will be build to import all of them. Good contributions, Ederporto (talk) 16:01, 7 October 2020 (UTC)
- But if you are building a tool for importing, why not import all of the possible items at the same time? And then make the measure of success that the import is completed (and just a note of how many items were created). ♥Ainali talkcontributions 17:54, 7 October 2020 (UTC)
- @Ainali: The tool will not be created just for the 100 items, that's just the measure of success: Someone(s), that do not know how the tool works or were involved, be able to successfully use it 100 times. It will be build to import all of them. Good contributions, Ederporto (talk) 16:01, 7 October 2020 (UTC)
- Alright, that makes more sense in one way. But building a tool to create 100 items seems like a lot of overhead. Even if you spent one hour per item (which is quite generous) you would be done in less than half the time allotted for this project if you did it manually. ♥Ainali talkcontributions 20:33, 2 October 2020 (UTC)
- @Ainali: I'm referring to the following statement One tool to import the law data into the Wikimedia projects (At least 100 successful uses). In this statement, I'm referring to the statement right above Import all Brazilian legislation available at Palácio do Planalto website into Wikidata (+-28K items), meaning one tool to import the data from Palácio do Planalto (and LeXML as they refer to the same object) with at least 100 successful uses, that is, 100 items about Brazilian law created using the tool. Good contributions, Ederporto (talk) 20:02, 2 October 2020 (UTC)
- Ok, but I have a bit of a hard time understand how you calculated the scale. You have two sites, and I imagine any tool that are more specialized than a spreadsheet (which is already fairly easy to use for converting to QS) will be very hard to use on websites that are not formatted exactly the same as those two (unless they are fully automatic and can adapt to a site intelligently, but that seems like a much larger job than 240 hours). Are there 98 more sites in the world following the same standard? ♥Ainali talkcontributions 22:13, 30 September 2020 (UTC)
Comment An important point here --IMHO-- is that the metric that is being used here only considers a specific group of legislation in Brazil, and therefore this project could serve as a pilot to expand to different historical periods in Brazil and to different administrative levels: in Brazil --like in many other countries--, laws are not only promulgated at the federal level but also at the state and municipal level. This documentation is of key relevance for Wikisource and Wikipedia in Portuguese, and immediate global impact includes the need to establish ontologies and resources that can be used in other cases. --Joalpe (talk) 16:52, 7 October 2020 (UTC)
Folks here might be interested in that proposal. –MJL ‐Talk‐☖ 21:25, 3 October 2020 (UTC)