This project is read-only.
0
Vote

Dedupe the feeds before indexing or while saving to the main store.

description

  • Dedupe the feeds before indexing or while saving to the main store.
  • Probably the fix could be maintaining one more version of mainstore (staging) will helps to prevent duplication.
  • And also have extra delete step before adding to mains store and also the final index.
    -May be before adding to the final index and main store doing the merge by comparing old and new will fetch some value because of the data could be diffrent from the old to new.

comments

wrote Nov 7, 2012 at 6:12 PM

Associated with changeset 23904.

wrote Nov 7, 2012 at 7:16 PM

wrote Nov 7, 2012 at 7:17 PM

wrote Nov 7, 2012 at 7:17 PM

Dedupe is implemented by saving queued urls, testing now.

wrote Jan 17, 2013 at 6:51 PM