Page 1 of 1
My Multithreaded Merge Sort Algorithm takes 10+ minutes to s
Posted: Wed Nov 02, 2011 12:19 am
by davidthefat
So this is for my AP Computer Science class in high school (equivalent to 2nd semester of Comp Sci 101) that focuses on data structures. So we have this little contest on who can make the fastest sorting algorithm. This current algorithm only takes 450 ms for sorting 20,000 strings. I do not know how that is compared to others, but seems like I'm on the right track. It handles 100,000 strings fine; less than 7 seconds (all these numbers include reading and writing the files) However, it just is a slug when it comes to 1 million strings. I tried to minimize the copying of data and just kept it to accessing using pointers (From what I've been told, all objects in java are pointers) Is this a nature of merge sort in itself or is it a flaw in my methodology?
Keep in mind, my system is a quad core cpu so I purposely made it only have 4 active running threads per file being sorted at any given time, but I've heard that the rule of thumb is 1.5 * the # of cores. The overhead from initiating the threads are negligible IMHO. It takes about 1/4 the time a single threaded merge sort does on my PC. And I am very aware that I should be commenting my file... Just a habit of never commenting for years.
I purposely did not join the sort threads until I started all of them by design. I know that it would be sluggish around 5+ files, but I will be testing one file at a time.
My sort method: [1]
http://pastebin.com/YwHU4BPz
My "main" file: [2]
http://pastebin.com/9kj4fpXL
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Wed Nov 02, 2011 9:47 am
by short
ok your going for speed ..... are you forced to write it in java...?
(you know somebody was going to ask, might as well be me)
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Wed Nov 02, 2011 12:12 pm
by Rebornxeno
A little overhead really adds up if your making threads for everything you sort
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Wed Nov 02, 2011 6:00 pm
by davidthefat
short wrote:ok your going for speed ..... are you forced to write it in java...?
(you know somebody was going to ask, might as well be me)
Yes, I am forced to write in java because the class is taught in java.
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Thu Nov 03, 2011 3:57 am
by LeonBlade
short wrote:ok your going for speed ..... are you forced to write it in java...?
(you know somebody was going to ask, might as well be me)
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Thu Nov 03, 2011 6:29 am
by GroundUpEngine
davidthefat wrote:short wrote:ok your going for speed ..... are you forced to write it in java...?
(you know somebody was going to ask, might as well be me)
Yes, I am forced to write in java because the class is taught in java.
The joys of life lol
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Thu Nov 03, 2011 3:27 pm
by Ginto8
Since the performance is fine up until 1 million strings, I'm guessing you're running into memory problems, and the computer is starting to use (uber-slow) swap space. That, or the function stack is getting a little too deep and is slowing things down. I think it's a space complexity issue though.
Edit: taking a closer look at your implementation, I notice 2 things: first, that your method, run on a large data set, would create a HUGE amount of threads. And second, that these threads are... well, useless. If all you're doing is starting the threads then waiting for them to end, why not just have them be non-threaded methods? I think if you removed the threading you could actually make the algorithm significantly faster.
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Thu Nov 03, 2011 8:41 pm
by Light-Dark
uh-oh Java, well Java doesn't do well performance wise i at least have hope that you are also learning C/++ so you may be forgiven for your transgressions?
Re: My Multithreaded Merge Sort Algorithm takes 10+ minutes
Posted: Thu Nov 03, 2011 11:37 pm
by davidthefat
Ginto8 wrote:Since the performance is fine up until 1 million strings, I'm guessing you're running into memory problems, and the computer is starting to use (uber-slow) swap space. That, or the function stack is getting a little too deep and is slowing things down. I think it's a space complexity issue though.
Edit: taking a closer look at your implementation, I notice 2 things: first, that your method, run on a large data set, would create a HUGE amount of threads. And second, that these threads are... well, useless. If all you're doing is starting the threads then waiting for them to end, why not just have them be non-threaded methods? I think if you removed the threading you could actually make the algorithm significantly faster.
If you have not noticed, it actually does not spawn more than 6 threads other than the main and read file threads (and other overhead threads java has, if it has any)
Code: Select all
@Override
public void run()
{
mergesort(0, num - 1, 2); //This makes it only spawn 6 threads; 2 threads that calls two each.
}
Code: Select all
public static void mergesort(int l, int h, int i)
{
if (l < h)
{
int m = (l + h) / 2;
if(i > 0) //Checks if that "i" in the argument is bigger than 0. It recursively decrements it two lines below. The else statement does not generate any threads.
{
merger t1 = new merger(l, m, i - 1);
merger t2 = new merger(m + 1, h, i - 1);
t1.start();
t2.start();
try
{
t1.join();
t2.join();
}
catch(InterruptedException e)
{
System.out.println(e.getMessage());
}
}
else
{
mergesort(l, m, 0);
mergesort(m + 1, h, 0);
}
Light-Dark wrote:uh-oh Java, well Java doesn't do well performance wise i at least have hope that you are also learning C/++ so you may be forgiven for your transgressions?
Do not worry; I know C++. In fact, the teacher almost did not let me take the class because I used C++ all year in his class two years ago... Just a fun fact. Why I am taking the class again is because he is teaching data structures and algorithms to people who already too the class.
edit: I'm a freaking idiot... I knew constructing the array of strings every iteration will **** me up... So I fixed it; it runs like a charm. 1 million strings now take ~790 ms... Fixed Code: http://pastebin.com/Jqf4TVEk