Please see Data Normalization , Common Pointers and Foreign Keys to keep data clean with referential integrity. This will help you.
Storing data in arrays may seem natural to you on paper, but on the db engine, performance will be mostly without an index. Moreover, you will find on Day 2 that receiving and maintaining your data will be a nightmare.
The following should get you going with a good start when you mess. Joins .
create table student ( studentId int auto_increment primary key, fullName varchar(100) not null -- etc ); create table dept ( deptId int auto_increment primary key, deptName varchar(100) not null -- Economics -- etc ); create table course ( courseId int auto_increment primary key, deptId int not null, courseName varchar(100) not null, -- etc CONSTRAINT fk_crs_dept FOREIGN KEY (deptId) REFERENCES dept(deptId) ); create table SCJunction ( -- Student/Course Junction table (aka Student is taking the course) -- also holds the attendance and grade id int auto_increment primary key, studentId int not null, courseId int not null, term int not null, -- term (I am using 100 in below examples for this term) attendance int not null, -- whatever you want, 100=always there, 0=he must have been partying, grade int not null, -- just an idea -- See (Note Composite Index) at bottom concerning next two lines. unique key(studentId,courseId,term), -- no duplicates allowed for the combo (note student can re-take it next term) key (courseId,studentId), CONSTRAINT fk_sc_student FOREIGN KEY (studentId) REFERENCES student(studentId), CONSTRAINT fk_sc_courses FOREIGN KEY (courseId) REFERENCES course(courseId) );
Create test data
insert student(fullName) values ('Henry Carthage'),('Kim Billings'),('Shy Guy'); -- id 1,2,3 insert student(fullName) values ('Shy Guy'); insert dept(deptName) values ('History'),('Math'),('English'); -- id 1,2,3 insert course(deptId,courseName) values (1,'Early Roman Empire'),(1,'Italian Nation States'); -- id 1 and 2 (History dept) insert course(deptId,courseName) values (2,'Calculus 1'),(2,'Linear Algebra A'); -- id 3 and 4 (Math dept) insert course(deptId,courseName) values (3,'World of Chaucer'); -- id 5 (English dept) -- show why FK constraints are important based on data at the moment insert course(deptId,courseName) values (66,'Fly Fishing 101'); -- will generate error 1452. That dept 66 does not exist -- That error is a good error to have. Better than faulty data -- Have Kim (studentId=2) enrolled in a few courses insert SCJunction(studentId,courseId,term,attendance,grade) values (2,1,100,-1,-1); -- Early Roman Empire, term 100 (made up), unknown attendance/grade insert SCJunction(studentId,courseId,term,attendance,grade) values (2,4,100,-1,-1); -- Linear Algebra A insert SCJunction(studentId,courseId,term,attendance,grade) values (2,5,100,-1,-1); -- World of Chaucer -- Have Shy Guy (studentId=3) enrolled in one course only. He is shy insert SCJunction(studentId,courseId,term,attendance,grade) values (3,5,100,-1,-1); -- Early Roman Empire, term 100 (made up), unknow attendance/grade -- note if you run that line again, the Error 1062 Duplicate entry happens. Can't take same course more than once per term
Some simple questions.
Which course is in which department?
show everything, uses table aliases (abbreviations) to make typing less, readability (sometimes) better
select c.courseId,c.courseName,d.deptId,d.deptName from course c join dept d on c.deptId=d.deptId order by d.deptName,c.courseName -- note the order +----------+-----------------------+--------+----------+ | courseId | courseName | deptId | deptName | +----------+-----------------------+--------+----------+ | 5 | World of Chaucer | 3 | English | | 1 | Early Roman Empire | 1 | History | | 2 | Italian Nation States | 1 | History | | 3 | Calculus 1 | 2 | Math | | 4 | Linear Algebra A | 2 | Math | +----------+-----------------------+--------+----------+
Who takes the course of Chaucer's world for this period?
(knowing the course Id = 5)
The following is one of our composite indexes in SCJunction. A component is an index on more than one column.
select s.StudentId,s.FullName from SCJunction j join student s on j.studentId=s.studentId where j.courseId=5 and j.term=100 +
Kim Billings credited to this term?
select s.StudentId,s.FullName,c.courseId,c.courseName from SCJunction j join student s on j.studentId=s.studentId join course c on j.courseId=c.courseId where s.studentId=2 and j.term=100 order by c.courseId DESC
Kim is overloaded, so the drop in math class damage
delete from SCJunction where studentId=2 and courseId=4 and term=100
run this select select statement showing what Kim is doing:
+-----------+--------------+----------+--------------------+ | StudentId | FullName | courseId | courseName | +-----------+--------------+----------+--------------------+ | 2 | Kim Billings | 5 | World of Chaucer | | 2 | Kim Billings | 1 | Early Roman Empire | +-----------+--------------+----------+--------------------+
Ah, much easier. Dad won't be happy though.
Pay attention to things like SCJunction.term. Much can be written about this, I will skip it at the moment mainly, except to say that it should also be somewhere in FK. You may want your term to look more like SPRING2015 rather than int.
And as for id. So I would do it. This is a personal preference. This will require knowledge of id # looking at them. Others might choose a course, like HIST101, rather than 17. They are more readable (but slower in the index (barely). Do what is best for you.
Note Compound Index
A component index (INDEX means KEY and vice versa) is one that combines multiple columns to quickly retrieve data. Orders are turned upside down for two composites in the SCJunction table, so depending on the query universe that comes with your data, the db engine can choose which index to use for quick searches based on the leftmost column you are going to post.
As for the unique key, No. 1, the comment next to it, indicating that forced duplication (which means unwanted data), requires no explanation. For example, student 1 course 1 term 1 cannot exist twice in this table.
The most important concept to understand is the concept of arranging left-most column names in an index.
For requests that go after studentId only , then the key with studentId specified first ( left-most ) is used. In queries that go after courseId only , then the key with courseId on the left is used. In queries that follow studentId and courseId, the db mechanism can decide which composite key to use.
When I say "go for," I mean on clause or where clause .
If you did not have these two composite keys (with inverted columns 1 and 2), then in queries where the column you are looking for is not indexed left-most , you will not need to use the key and the tablecan will not be slow to return data.
So, these two indexes combine the following 2 concepts
- Fast data retrieval based on the leftmost or both (student and learning columns)
- Ensuring no duplication of data in this table based on studentId, courseId, and term
The takeaway
An important notion is that Junction tables for quick index search and prudent data management compared to comma-separated data (massive thinking) clogged in a column, and all the suffering in using such a construct.