PSET 4 solutions for cs50 class

whod unit

This program takes an input file, and changes the RGB values inside the file

It revolves around understanding that pixels are 3 bytes, stored in values, and interpreting how those values are shown on a computer.

All you had to understand was that inbetween the “fread” and “fwrite” mode, you could change the hexadecimal color values through its typedef struct

Basically white color is 0xFFFFFF, and red is 0x0000FF (the first four numbers after 0x are blue and green respectively). You change values to match that.

Imagine going to your monitor and adjusting blue,green, white values during color calibation. Putting all BGR values to 100 sets the color of screen to white. Actually I’m pretty sure if you took red,green, and blue paint and mixed them together in equal quantities it makes white.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// iterate over infile's scanlines
for (int i = 0, biHeight = abs(bi.biHeight); i < biHeight; i++)
{
// iterate over pixels in scanline
for (int j = 0; j < bi.biWidth; j++)
{
// temporary storage
RGBTRIPLE triple;
// read RGB triple from infile, blue green red in order
fread(&triple, sizeof(RGBTRIPLE), 1, inptr);
//make some colors red - This part here
if (triple.rgbtGreen == 0x00){
triple.rgbtGreen = 0xFF;
}
if (triple.rgbtBlue == 0x00){
triple.rgbtBlue = 0xFF;
}
// write RGB triple to outfile
fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);
}
// skip over padding, if any
fseek(inptr, padding, SEEK_CUR);
// then add it back (to demonstrate how)
for (int k = 0; k < padding; k++)
{
fputc(0x00, outptr);
}
}
// close infile
fclose(inptr);
// close outfile
fclose(outptr);
// that's all folks
return 0;
}

The only things I added was line 15-20

Resize.C + bmp.h

This program takes an image, takes in a factor, and scales the image to the new factor size. So if the factor was 3, this is what would have to happen:

  1. The image width would need to be 3xs longer, in the appropiate locations.
  2. The first 3 rows in newest images need to be same thing

Its basically like taking an image and scaling it upwards

First the predefined header of the bmp file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <stdint.h>
typedef uint8_t BYTE;
typedef uint32_t DWORD;
typedef int32_t LONG;
typedef uint16_t WORD;
typedef struct
{
WORD bfType;
DWORD bfSize;
WORD bfReserved1;
WORD bfReserved2;
DWORD bfOffBits;
} __attribute__((__packed__))
BITMAPFILEHEADER;
typedef struct
{
DWORD biSize;
LONG biWidth;
LONG biHeight;
WORD biPlanes;
WORD biBitCount;
DWORD biCompression;
DWORD biSizeImage;
LONG biXPelsPerMeter;
LONG biYPelsPerMeter;
DWORD biClrUsed;
DWORD biClrImportant;
} __attribute__((__packed__))
BITMAPINFOHEADER;
typedef struct
{
BYTE rgbtBlue;
BYTE rgbtGreen;
BYTE rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;

Week 4 notes introduced typedefs and structs. Typedefs are just abbreviations to make it easier to type out variables. Structs are basically “super variables” or variables that house more variable types.

attribute_((__packged__)) basically just means ignore multiple of 4 padding for GCC compilation, see https://stackoverflow.com/questions/11770451/what-is-the-meaning-of-attribute-packed-aligned4

The C program below is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
* copy.c
*
* Computer Science 50
* Problem Set 4
*
* Copies a BMP piece by piece, just because.
*/
#include <stdio.h>
#include <stdlib.h>
#include "bmp.h"
int main(int argc, char* argv[])
{
// 4 arguments
if (argc != 4)
{
printf("Usage: ./copy factor infile outfile\n");
return 1;
}
//Checks for factor
int factor = atoi(argv[1]);
if (factor <= 0 || factor >= 100)
{
printf("Must be between 0 and 100\n");
return 6;
}
char* infile = argv[2];
char* outfile = argv[3];
// Input file open
FILE* inptr = fopen(infile, "r");
if (inptr == NULL)
{
printf("Could not open %s.\n", infile);
return 2;
}
// open output file
FILE* outptr = fopen(outfile, "w");
if (outptr == NULL)
{
fclose(inptr);
fprintf(stderr, "Could not create %s.\n", outfile);
return 3;
}
// read infile's BITMAPFILEHEADER
BITMAPFILEHEADER bf;
fread(&bf, sizeof(BITMAPFILEHEADER), 1, inptr);
// read infile's BITMAPINFOHEADER
BITMAPINFOHEADER bi;
fread(&bi, sizeof(BITMAPINFOHEADER), 1, inptr);
// ensure infile is (likely) a 24-bit uncompressed BMP 4.0
if (bf.bfType != 0x4d42 || bf.bfOffBits != 54 || bi.biSize != 40 ||
bi.biBitCount != 24 || bi.biCompression != 0)
{
fclose(outptr);
fclose(inptr);
fprintf(stderr, "Unsupported file format.\n");
return 4;
}

Most of the top portion of this file remains unchanged. I changed the number of arguments to 4, to account for the scaling factor on the image. Everything else here checks whether the files are reading correctly and a bitmap is actually hte input file by its number of bytes in header.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//----------------new files -------------
//Initialize the new_bf
BITMAPFILEHEADER new_bf;
BITMAPINFOHEADER new_bi;
new_bf = bf;
new_bi = bi;
//Set newest
new_bi.biHeight = bi.biHeight*factor;
new_bi.biWidth = bi.biWidth*factor;
//Padding for fseek, multiple of 4
int padding = (4 - (bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;
int new_padding = (4 - (new_bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;
//New Image Size
new_bi.biSizeImage = (new_bi.biWidth *sizeof(RGBTRIPLE) + new_padding) * abs(new_bi.biHeight);
new_bf.bfSize = new_bi.biSizeImage + 54; //54 is BF+BI
//open file to write output
fwrite(&new_bf, sizeof(BITMAPFILEHEADER), 1, outptr);
fwrite(&new_bi, sizeof(BITMAPINFOHEADER), 1, outptr);

This part took me the longer to understand and comprehend. Essentially what happens is the following:

  • A new struct bf and bi, 54 bytes total, named new_bf and new_bi to prevent confusion of the existing var
  • Default values of original image are used (e.g. same offset values, planecolor values, etc) to initialize information
  • Set new width and height
  • Padding is used for fseek on the input, and new_padding for the fwrite on output
  • BiSizeImage is the size of the pixel area, bfSize is that + the 54 bytes in the header

An important concept that I learned from this is that there’s always 54 bytes in a .bmp file. Each value is offsetted by 2 or 4 bytes depending if its a DWORD, etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// iterate over infile's scanlines
for (int i = 0, old_biHeight = abs(bi.biHeight); i < old_biHeight; i++) //changed to old_biheight to prevent confusion
{
//Not efficient, but scans over the same input line for factor*output rows
for (int l = 0; l<factor; l++)
{
// iterate over pixels in scanline
for (int j = 0; j < bi.biWidth; j++)
{
// temporary storage
RGBTRIPLE triple;
// read RGB triple from infile
fread(&triple, sizeof(RGBTRIPLE), 1, inptr);
//BGR + BGR + BGR factor # of times on width
for (int r=0; r<factor; r++){
fwrite(&triple, sizeof(RGBTRIPLE), 1, outptr);
}
}
//Input files padding, pass over seeker
fseek(inptr, padding, SEEK_CUR);
//Add multiple of 4 padding to new image size
for (int k = 0; k < new_padding; k++)
{
fputc(0x00, outptr);
}
//pushes cursor back and iterate # of factor times on "L" row
if (l < factor - 1)
{
long offset = bi.biWidth*sizeof(RGBTRIPLE)+padding; //By default, files are unsigned (nonnegative). Need to define negative signed long
fseek(inptr, -offset, SEEK_CUR); //this shoves cursor back the entire length to start
}
}
}
// close infile
fclose(inptr);
// close outfile
fclose(outptr);
// that's all folks
return 0;
}

Next after initializing everything, is scanning the input to create the output. I took the original file and essentially added a few things, namely:

  • FORLOOP l=0 mostly so if I had a factor of 3 on the image, I can repeat the same input row 3xs for output
  • FORLOOP r=0, this creates the width output. RGB values represent one pixel, and are repeated for a factor of 3 as : BGR+BGR+BGR
  • for (int k=0; k<newpadding;K++) this part is for the fputc for writing output, as we need multiple of 4
  • if l<factor-1 fseek line. Basically, there’s one technically only one scanline and as new inputs are being read, the SEEK_CUR is adjusting as new reads are happening. Because I’m iterating over the same input row multiple times, I need to re-read some row factor of times, hence this -offset. It needs to be set as a “long” function because by default its an unsigned (positive integer only) when I need negative pushback on fseek.

Recover.C

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
/**
* recover.c
*
* Computer Science 50
* Problem Set 4
*
* Recovers JPEGs from a forensic image.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef uint8_t BYTE;
int main(void)
{
//Open the CF card
FILE* inptr = fopen("card.raw", "r");
//check if file is null
if (inptr ==NULL)
{
fclose(inptr);
fprintf(stderr, "card.raw file not working \n");
return 1;
}
char title[8];
FILE* img = NULL;
//Initialize variables
BYTE buffer[512];
int fnumber = 0;
int searchjpeg = 1; //1 if true, 0 if false
//Cycle through each 512 MB block
while(fread(&buffer, 512, 1, inptr)==1){
if(searchjpeg==1){ //Start of new jpeg?
if(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] == 0xe0 || buffer[3] == 0xe1)) //4 of 16 possible
{ //yes
searchjpeg = 0;
sprintf(title, "%03d.jpg", fnumber);
img = fopen(title, "w");
fwrite(&buffer, 512, 1, img);
}
else
{ //no
printf("nothing\n");
}
}
else{ //already found a jpeg?
if(buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] == 0xe0 || buffer[3] == 0xe1))
{ //yes
fclose(img);
fnumber++;
sprintf(title, "%03d.jpg", fnumber);
img = fopen(title, "w");
fwrite(&buffer, 512, 1, img);
}
else
{ //no
fwrite(&buffer, 512, 1, img);
}
}
}
fclose(img);
free(buffer);
}

How it works

The comments here follow exactly the pseudocode outlined in CS50

  • sprintf("%03d.jpg", fnumber) gives file names as 001, 002, 003, etc as the 0 indicates to put leading 0’s
  • The first IF statements keep going until the first 512 bytes for a JPEG are found
  • Then only the next set of top level IF statements are used because (searchjpeg=0) now
  • An array of 512 bytes called buffer is constantly read and rewritten over
  • while(fread(&buffer, 512, 1, inptr)==1){ acts as both a logical statement to end the loop when no more data can be read, and physically reading the bytes into the buffer. 2 birds with 1 stone

History revisions

https://github.com/AnacondaPython/cs50solutions/commit/7139daff9d10f1d9146279e9cbe1ea9f8886cb34?diff=unified original code I hashed up had too many bugs. I tried using fseek -512 to move the cursor back and forth and using heap stack memory through malloc, but I could’ve just initialized an array on the stack according to https://www.quora.com/Where-are-the-arrays-in-C-stored-in-a-stack-or-a-heap

errors

https://cs50.stackexchange.com/questions/21776/pset4-works-correctly-but-check-shows-error/21777#21777

Everything performs as expected producing the 50 images perfectly on first iteration. But there’s some memory errors. Not really sure exactly where they are yet. Also the cs50 check isn’t producing perfect outputs but program still runs


Answer to 18 questions:

0. How many different colors does each format support

https://www.quora.com/How-many-different-colors-does-each-format-like-JPEG-BMP-GIF-and-PNG-suppor. It depends on number of bits the file allows on color. 8 bit is 256 colors

1. Which of the formats supports animation GIF, GIFV, WEBM

2. What’s the difference between lossy and lossless compression tech quickie summarizes it well. https://www.youtube.com/watch?v=guo8if4Yxhw .Lossless compression takes the binary code and condenses down the information some simple ciphers. A good visual answer here: http://i.imgur.com/ILqyO4j.png. Things that use lossless compression would be zip files, daemon tools, etc

Lossy compression is used for videos, results in chunks of data being removed. Youtube probably has the most advanced algorithm for this. Lossy compression can sometimes result in artifacts in videos.

http://lockergnome.com/2011/08/22/lossy-vs-lossless-video-compression/

https://www.youtube.com/watch?v=BsCeNCVb-d8

3. Which of these formats is lossy-compressed?

http://i.imgur.com/OoNdUgR.png. JPEG and GIFs are lossy. BMP and PNG are lossless MP3 and MP4 are also lossy

4. What happens, technically speaking, when a file is deleted on a FAT file system?

https://youtu.be/T87iyGiXb7o. Techquickie. The 010101010 within the system are unlinked to the mastertable, and given permissions to be rewritten over. If they’ve been overwritten then you lose it.

5. What can someone like you do to ensure (with high probability) that files you delete cannot be recovered?

Delete all the data. Rewrite (fill with more random data) several times over again on the same file so all the original bit and bytes are overrwriten. This option is usually done during reformatting of a hard drive

Physically destroying drive

6. What’s stdint.h?

Stdint.h is a header file

Specifies aliases for more primitive data types. There typedefs

7. What’s the point of using uint8_t, uint32_t, int32_t, and uint16_t in a program?

The number refers to the number of bits it can allocate. You can TYPEDEF them into more common terms like byte, long, word, etc

U stands for unsigned integer type. All 8 bits for that 1 byte are set to positive #s ranging from 0 to 255, this is for uint8_t. uint32_t

Signed values range from -128 to 127

Unsigned values from 0 o 255

http://stackoverflow.com/questions/247873/signed-versus-unsigned-integers

Unsigned bits can store larger values.

In this case integers

8. How many bytes is a BYTE, a DWORD, a LONG, and a WORD, respectively?

Byte is 8 bits, or 1 byte

DWORD is 32 bits ,or 4 bytes

LONG is 32 bits, or 4 bytes

WORD is 16 bits or 2 bytes

9. What (in ASCII, decimal, or hexadecimal) must the first two bytes of any BMP file be? (Leading bytes used to identify file formats (with high probability) are generally called “magic numbers.)”

Hex: 0x424d, 42 and 4d are the first two bytes. ASCII: BM for bitmap. reasoning 42 (hex) = 4(161)+ 2(160) = 66 in ASCII value or “B” Decimal would be 0

10. What’s the difference between bfSize and biSize? https://msdn.microsoft.com/en-us/library/windows/desktop/dd183374(v=vs.85).aspx

http://i.imgur.com/4HHtnWQ.png bfsize is the total number of bytes in the file header, 14 bytes. bisize is the number of bytes in the info header, 40 bytes

something like this, wikipedia’s documentation vs. CS50’s shorthand naming conventions are confusing:

A better conversion:

Also, looking into the file GDB describes what happens better

11. What does it mean if biHeight is negative?

(wikipedia and CS50 has been crappy as a resource for these problems, MSDN.microsoft.com is a better authority wiki)

https://msdn.microsoft.com/en-us/library/windows/desktop/dd318229(v=vs.85).aspx

Biheight being negative flips the DIB (also called bitmapinfo header) so its oriented towards top left instead of bottom left. Basically, I believe a flipped image across x-axis when representing the bitmap.

12. What field in BITMAPINFOHEADER specifies the BMP’s color depth (i.e., bits per pixel)?

Bitcount specifies the number of bits per pixel, and therefore max number of colors per pixel. Normally, 24 -BMP, 1 pixel has 3 bytes = 24 bits. Each byte represents a RGB values from 0 to 255 since its 1 byte each.

e.g. Red color = 1 byte Green color = 1 byte Blue = 1 byte

More bytes allocated means more colordepth, instead of 255 color choices, there could be more per RGB

13. Why might fopen return NULL in copy.c:37?

fopen might return null if it can’t find the file

14. Why is the third argument to fread always 1 in our code?

We’re specifiying number of elements read, there’s only 1 struct

If we had been reading two files, it’d be 2 (e.g. two bitmap files)

15. What value does copy.c:70 assign padding if bi.biWidth is 3?

Scanline needs to be a multiple of 4. Why it needs to be this way it isn’t explained. Bi.biwidth =3. Had to look it up

http://cboard.cprogramming.com/game-programming/6102-bitmap-4-byte-boundry.html

Basically, it has to do with the CPU and intels standarization to 4 bytes, and a byte being 4 bits.

16. What does fseek do?

Skips over any padding and then looks for the next pixel represented by RGB triple. File position indicator

17. What is SEEK_CUR?

https://www.reddit.com/r/cs50/comments/283i04/pset5_understanding_seek_cur/

This is the current position of the file pointer, to skip over the padding

18. whod’ unit

See https://anacondapython.github.io/77cs50pset4/

Pretty sure its rick astley rick rolling

Resources

  • None

Vincent Tang

Comments