Data mining and information extraction have traditionally been separate tasks each with their own communities, conferences, data sets, algorithms, and so forth. In the last few years especially, we've started to see a lot of consolidation and bridges being built between the two communities with people realizing that the two tasks really need to be part of one continous workflow. Most of the interesting data out there (especially text) lives across a diverse, often heterogeneous set of information delivery sources. While people in information extraction are typically interested in extracting structured records from either unstructured or semi-structured data, e.g. names and addresses from Web pages, which they can use to populate say relational databases, people in data mining generally assume they already have some cleaned up data with attribute/value pairs and want to try and explore the data to find interesting patterns. And in some cases this modularity of roles works quite well. But because of a huge variety of factors, including the emergence of immensely rich and freely available datasets on the Web as well as new sophisticated learning algorithms which people want to use, people have begun to realize the traditional model of of separation and modularity between tasks has largely failed them.
One of the main problems is reference linking. How do I know that the wording "Dr. Johnson" on press release X refers to the same entity as the wording "D. Johnson" on web page Y? If you run your information extraction algorithms on each data source independently and then try and merge them together naively to hand off to some data mining algorithm, you get the whole garbage in/garbage out problem. This is why, for example, the Citeseer computer science research index says that some D. Johnson, is the most highly cited author, when in fact whichever one they're referring to is most likely not. And the same reason why ZoomInfo says there are 6 different Padhraic Smyths when, in fact, there is only one. Trying to automatically figure out how many entitities there really are and which entity a reference is referring to is still a hard research problem and one that has become a very hot one in just the last few years. I don't have the state-of-the-art accuracies, e.g. F-1 scores, in front of me but I know they're way below human performance. The hardness of this problem is a reason why the vast majority of newbie commercial products/services out there, which are based on fusing information from disparate information sources on the Web, seem to add little value without wasting a huge amount of user time.
Another important reason people are tying the two tasks more closely together is that, in some cases, data mining and information extraction algorithms can help each other during the learning/disovery process. For example, if I want to automatically identify the communities which individuals in a social network belong to from semi-structured text, say emails, the uncertainty that I may have about who the sender is can help inform my community finding algorithm and also the community memberships I'm thinking about assigning to a person can help decrease my uncertainty that two different senders are in fact the same sender.
Finally, a third reason these two tasks are coming together is that in the last decade there has been a whole new mathematical foundation laid which unifies many of the problems in each task which were once thought to be completely distinct. It's the theory of graphical models. Hidden Markov models, Kalman filters, maximum entropy models, these can all be unified thanks primarily to some pioneering work done in the '80s by Judea Pearl. Unfortunately, industry is slow to catch up on this research and the vast majority of information extraction engines being deployed are still using rule-based engines which are labor-intensive to build and often very brittle when it comes to porting them across domains. As for data mining, most people are building off of relational database technologies which assume the data is cleaned up and can be put into structured records. For cases where that isn't possible, you often have non-standardized data mining implementations which are built in-house.
A great vision paper which elaborates on most of these ideas can be found here.