Suppose I have a customer table:
CREATE TABLE customers ( customer_number INTEGER, customer_name VARCHAR(...), customer_address VARCHAR(...) )
There is no primary key in this table. However, customer_name and customer_address must be unique for any given customer_number .
Often this table contains many duplicate clients. To get around this duplication, the following query is used to isolate only unique clients:
SELECT DISTINCT customer_number, customer_name, customer_address FROM customers
Fortunately, the table has traditionally contained accurate data. That is, there has never been a conflicting customer_name or customer_address for any customer_number . However, suppose conflicting data is in the table. I want to write a query that will fail, and not return multiple rows for the customer_number in question.
For example, I tried this query without success:
SELECT customer_number, DISTINCT(customer_name, customer_address) FROM customers GROUP BY customer_number
Is there a way to write such a query using standard SQL? If not, is there a solution in Oracle-specific SQL?
EDIT: Justification for a strange request:
In truth, this customer table does not actually exist (thank god). I created it, hoping it would be clear enough to demonstrate the needs for the request. However, people (fortunately) will realize that the need for such a request is the least of my worries based on this example. Therefore, I must now clear part of the abstraction and, hopefully, restore my reputation for having offered such an abomination to the table ...
I get a flat file containing invoices (one per line) from an external system. I read this file in turn, inserting its fields into this table:
CREATE TABLE unprocessed_invoices ( invoice_number INTEGER, invoice_date DATE, ... // other invoice columns ... customer_number INTEGER, customer_name VARCHAR(...), customer_address VARCHAR(...) )
As you can see, the data coming from the external system is denormalized. That is, the external system includes both account data and related customer data in one line. It is possible that multiple invoices will share the same customer, so it is possible to have duplicate customer data.
The system cannot start processing invoices until all customers are registered in the system. Therefore, the system must identify unique customers and register them as necessary. That's why I need a query: because I was working with denormalized data, I had no control over .
SELECT customer_number, DISTINCT(customer_name, customer_address) FROM unprocessed_invoices GROUP BY customer_number
Hope this helps clarify the original meaning of the question.
EDIT: Good / Bad Data Examples
To clarify: customer_name and customer_address must be unique for a particular customer_number .
customer_number | customer_name | customer_address ---------------------------------------------------- 1 | 'Bob' | '123 Street' 1 | 'Bob' | '123 Street' 2 | 'Bob' | '123 Street' 2 | 'Bob' | '123 Street' 3 | 'Fred' | '456 Avenue' 3 | 'Fred' | '789 Crescent'
The first two lines are exact because they are the same customer_name and customer_address for customer_number 1.
The middle two rows are exact because they are the same customer_name and customer_address for customer_number 2 (although the other customer_number has the same customer_name and customer_address ).
The last two lines do not match , because for customer_number 3 there are two different customer_address es.
The query I'm looking for will fail if you run all six of these lines. However, if only the first four lines existed, the view should return:
customer_number | customer_name | customer_address ---------------------------------------------------- 1 | 'Bob' | '123 Street' 2 | 'Bob' | '123 Street'
Hope this clarifies what I meant by "conflicting customer_name and customer_address ". They must be unique to customer_number .
I appreciate those that explain how to correctly import data from external systems. In fact, I already do most of this already. I deliberately hid all the details of what I am doing to make it easier to focus on the issue. This request should not be the only form of verification. I just thought it would be a nice touch (last defense, so to speak). This question was simply designed to examine only what was possible with SQL. :)