I am working on an internationalized database application that supports multiple locales in a single instance. When international users sort data in applications built on top of the database, the database theoretically sorts the data using a mapping that matches the language associated with the data that the user is viewing.
I am trying to find sorted lists of words that meet two criteria:
- sorted order follows sorting rules for locale
- the listed words will allow me to fulfill most / all specific sorting rules for a locale
I am having trouble finding such reliable test data. Are such data sets currently available for sorting, and if so, where are they?
"words.en.txt" is an example text file containing American text in English:
Andrew Brian Chris Zachary
I plan to load the word list into my database in a randomized order and check if the sorting of the list matches the original input.
Since I don’t speak any language other than English, I don’t know how to create data samples, for example, the following sample in French (call it “words.fr.txt”):
cote côte coté côté
The French prefer diacritics, which must be ordered from right to left. If you sorted this code using a code-code, it most likely will come out like this (this is a wrong sort):
cote coté côte côté
Thanks for the help, Chris.
source share