Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-1187

Full-text indexing of right-to-left PDF files

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.8.2, 3.4 , 4.3, 5.2
    • Fix Version/s: 6.0
    • Component/s: DSpace API
    • Labels:
    • Environment:
      All versions
    • Attachments:
      2
    • Comments:
      3
    • Documentation Status:
      Not Required

      Description

      The full-text indexing (filter-media) of PDF files that read from right-to-left (RTL), such as Arabic, does not get indexed properly. It results in every word getting indexed in reverse. As a result, the search queries does not match any text from the full-text of the document.

        Attachments

        1. book1_bw.pdf
          1.22 MB
          Saiful Amin
        2. pdffilter-parseRTL.patch
          2 kB
          Saiful Amin

          Issue Links

            Activity

              People

              Assignee:
              helix84 Ivan Masár
              Reporter:
              saiful Saiful Amin
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1 hour
                  1h
                  Remaining:
                  Remaining Estimate - 1 hour
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified