Ah man, this is a tough one - mostly because it's so long, so it's very hard to comment on any one particular thing!
Adding the voices definitely helps a lot. The quality is a bit mixed, but it's loud and clear with no sound effects or music interrupting, and the delivery is mostly fine. Way better than using some text-to-speech stuff. The gag with Tofu's swearing being bleeped was funny, though the actual mooing sound started getting extremely grating. Clicking through it again, a good example is at about 1:47 where Tofu just retorts with an "uh huh" but the sound effect lasts for a good 5 seconds, doesn't seem to align well with such a short response and there's not much of a gag payoff anywhere, and then the very same thing occurs right after at abour 2:05. It really slows the pace down and after a while it's kind of painful to listen to as it's kind of a slow, "scraping" sound. You could maybe get a similar effect through a visual gag with some extra face graphics, like one where the top half of the eyes are covered to create a disinterested expression? The other characters have at least a couple frames showing various moods, even the news anchor.
At 9:37 Tofu just seems to moo for no reason, was a line missing?
The backgrounds are definitely improved, though you could maybe tweak the colors in some to be a bit muted compared to the characters, or give the characters somewhat thicker outlines to stand out more against the backgrounds. Maybe not use black outlines for the backgrounds too, but it's not a massive problem. The "human" characters could probably use a couple extra identifiers since they all have the same voice, like Doc having some prominent glasses or something, and Rufus could have a more pallid complexion. Either that or lay it on thick with your best Igor impression. It's probably easiest to lay out the script, and then do all the lines for one character at a time with whatever impersonation or accent you want to use.
Animation is hard and there's always something that can be tweaked and polished, but I think you're on the right track. As stupid as it sounds, I think the mooing might be better if you just do it yourself so you can match the length to the lines better? The sound quality might be better too so it hurts less to listen to for 15+ minutes.