File-1 is having 5 million strings and File-2 is having 1 million strings. Give an Algo to remove duplicates and merge these files (Need not be sorted) into File-3.
Assuming that File-1 and File-2 have no duplicates within themselves? And memory limitation is not a issue:
- Iterate through File-2 adding each string to a HashMap and writing each string to File-3
- Iterate through File-1, check if each string is present in the HashMap, if its not then write the string to File-3. If memory is an issue, you could use a memory-mapped-file to store the HashMap strings.