Community Wishlist Survey 2023/Wikisource/Remember OCR column/region profiles/Proposal
Appearance
- Problem: Currently, using OCR to extract text from books is bad at detecting irregular features such as notes and columns (see OCR output for wikisource:Page:Login_USENIX_Newsletter_feb1983.djvu/2, and the same page in the OCR tool where rectangular regions can be selected).
- Proposed solution: Provide a mechanism (Gadget or via Wikimedia OCR) that will allow users to demarcate columns and specific areas that need to be OCRed together, and store these areas against the Index page so they don't need to be repeated for every page.
- Who would benefit: Wikisource editors
- More comments:
- Phabricator tickets:
- Proposer: Sohom Datta (talk) 12:37, 30 January 2023 (UTC)