Google code jam

From NoskeWiki
Jump to: navigation, search

About

Google Code Jam (GJC) is an international programming competition hosted and administered by Google which began in 2003. It's run every year, starting with a medium-difficult qualification round, and then several rounds of difficult problems. The competition fascinates me, so I've started to work out some of my own solutions in my spare time. Solutions are judged by (1) how fast they were coded, (2) successful running on their first attempt and (3) overall run time. Winning solutions for a past problems are on the Google Code Jam website, but on this pages I've shown some of my own solutions where I've taken the extra time to add comments and make the code relatively easy to read. In coding competitions like this, competitors will typically use "terse programming" (designed for speed, not readability) so few if any posted solutions will include comments. In my own code I always like to include explanations and comments, so that others can easily understand and maintain my code, and so that's what I aim to do here.


DISCLAIMER: So far I've only had time to solve a few of the many problems, and haven't yet typed up brief descriptions and explanations for any, so yes, this pages is a work in progress!


Technique

In these Google Code Jam you can use a number of different languages, but here I've chosen C++. It seems to be that around 50% of posted solutions on the Code Jam site itself are in C++ too. Since speed is of the essence to "score" well in the competition, most people will write very "terse code" meaning:

  • mostly procedural code (you may see some struct, but there's no time to write big classes)
  • very small variable names (most of them single letters)
  • use of the STL classes and library functions (as much as possible)
  • no error checking (assume your input)
  • no comments (no time!)
  • minimal console program which pipes data in then out


Almost all Google Code Jam problem give input in the form of a text files (eg: "A-small-practice.in"), which you must parse and then produce your answer into another text file (eg: "A-small-practice.out"). The output file can then be uploaded to the website to tell you if you are wrong or right. The quickest way to do this in C++ is to "pipe" the input file into your input program, parse text with "cin" and then pipe out to a file with "cout". Here's a very minimal example of how this works:

Our input file "A-small-practice.in":

2
1 1
2 2

Our expected output file "A-small-practice.out":

Case #1: 2
Case #2: 4

In this example the first line tells us how many "cases" we have, and each line after that has two numbers (a and b) which we must add together and write out. A minimal C++ program to parse in then and output the data is as follows:

Our (single) source file called "a.cpp":

#include <iostream>
using namespace std;
 
int main()
{
  int cases, a, b;                 // Short variable names.
  cin >> cases;                    // Most of these problems start with "number of cases" on the first line.
  for(int c=1; c<=cases; c++)      // So it makes sense to have a for loop for the number of cases:
  {
    cin >> a; cin >> b;                                 // For each case we'll read in a certain number of input
    cout << "Case #" << c << ": " << (a*b) << endl;     // and usually write out one line for each case.
  }
  return 0;                        // Return 0 for success (some compilers will complain if you do a "void main" with no return.
}

Note that this method needs only the <iostream> library. We also only want a single source file and use a very short name "a.cpp". Although the problem I've shown is overly simplistic (normally your solutions will need data structures and at least three times more lines of code), it serves as a good skeleton for the input/output which you can use for all harder problems. To compile our code and then pipe in and out our files we need these lines:

g++ -o a.exe a.cpp
./a.exe < A-small-practice.in > A-small-practise.out

The second line feeds the input input file (in the same directory) into our program and outputs the result to an output file - ready to test on the site for correctness. This method of "terse code" and piping input/output may "feel" a bit wrong to programmers who like descriptive name and like to have programs that are more versatile (i.e. could take in multiple files), check for errors and use descriptive file names. Here however speed is of the essence, so compare the above example to what our code might look like the slow way with a file library like <fstream>:

#include <iostream>   // For '<<' and '>>', cin, cout and cerr (for errors).
#include <fstream>    // To read in and write out files, and includes istream.getline(char* s, streamsize n, char delim).
#include <stdlib.h>   // Includes the "atoi" function.
#include <string>     // May want for "getline(isteam& is,string& out,char delim='\n')".
using namespace std;  
 
int main()
{
  ifstream in ("A-small-practice.in");         // The file we want to read in.
  ofstream out("A-small-practice.out");        // The file we want to output to.
  if (!in.is_open() || in.eof())               // Error checking just slows us down.
  {
    cerr << "ERROR: invalid input file" << endl;
    return (-1);
  }
  if(!out.is_open())                            // So for code jam you shouldn't bother with these!
  {
    cerr << "ERROR: couldn't create ouput file" << endl;
    return (-1);
  }
 
  int numCases;                       // |-- Descriptive variable names are useful in real programming,
  int valueToAdd1;                    // |   but will slow you down in code jam or a code interview.
  int valueToAdd2;                    // |
  string line;                        // |-- If we want to call "getline" continually we need some type of string
  getline(in, line, '\n');            // |   then need to read in each line, and then have the great
  numCases = atoi(line.c_str());      // |   annoyance of needing functions like "atoi" to convert.
 
  for (int c=1; c<=numCases; c++)               // For each case:
  {
    if(in.eof())  { return (-1); }              // Forget lines like this, they get too painful.
    in >> valueToAdd1;                            // |-- The >> operator is far easier than getline
    in >> valueToAdd2;                            // |   and checking "in.eof()" for each variable.
    int result = valueToAdd1 + valueToAdd2;
    out << "Case #" << c << ": " << result << endl;
  }
  in.close();       // |-- If we want to be proper we'd also want to close our files
  out.close();      // |   (yet more lines of code).
 
  return 0;
}


The example above demonstrates how much extra code can be required if we decide to open files and perform error checking from within our code. The code above has 29 non-empty lines of code including 10 for opening/closing files and error checking. Compare this to just 9 lines in our example using terse code using just "cin" and "cout" for piping files. In the code I add below you'll notice I do use comments and longer variable names than you would in "speed programming", but I've done so to help you (and myself) understand the solution! Just because you tried to program something quickly doesn't mean you won't have time to add comments afterwards!  :)


Selected Google Code Jam Problems

GCJ 2008: Round 1A. "Saving the Universe"

Read the problem here

Analysis


Solution (C++)

#include <iostream>
#include <fstream>
#include <string>
#include <hash_map>      // Or could be "map".
 
using namespace std;
 
 
int main()
{
  string line;
 
  ifstream in("A-large-practice.in");    //(0): A-very-small.in    //(1): A-small-practice.in    //(2): A-large-practice.in
  if (!in || in.eof()) {
    cerr << "ERROR: invalid file" << endl;
    return (-1);
  }
 
  int numCases;
  int numEngines;
  int numSearches;
  hash_map<string,int> engine;
  hash_map<string,int>::iterator it;
 
  getline(in, line);
  numCases = atoi(line.c_str());
 
  for (int c=0; c<numCases; c++)
  {
    //## READ IN ENGINES:
    getline(in, line);
    numEngines = atoi(line.c_str());
    engine.clear();
    for (int e=0; e<numEngines; e++)
    {
      getline(in, line);
      engine.insert(pair<string,int>(line,0));
    }
 
    //## READ IN SEARCHES:
    getline(in, line);
    numSearches = atoi(line.c_str());
    int enginesUsed = 0;
    int switches = 0;
    for (int s=0; s<numSearches; s++)
    {
      getline(in, line);
 
      if(engine[line] == 0)
      {
        enginesUsed++;
 
        if(enginesUsed >= numEngines)
        {
          enginesUsed=1;
          switches++;
          for (it = engine.begin(); it != engine.end(); ++it)
            it->second = 0;
        }
 
        engine[line] = 1;
      }
    }
 
 
    cout << "Case #" << (c+1) << ": " << switches << endl;
    //cout << "numEngines: " << (numEngines) << endl;
    //cout << "numSearches: " << (numSearches) << endl;
  }
 
  return 0;
}



GCJ 2008: Round 1A. "Minimum Scalar Product"

Read the problem here

Analysis

Solution (C++)

#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <algorithm>
using namespace std;
 
 
vector<long long> splitStringIntoLongLong(const string str, const char sep=' ')
{
  vector<long long> vect;
  unsigned int lastI = 0;
  for(unsigned int i=0; i<str.length(); i++)
  {
    if(str[i]==sep || i==(str.length()-1))
    {
      string substr = str.substr(lastI, i);
      vect.push_back(atoi(substr.c_str()));
      lastI = i;
    }
  }
  return vect;
}
 
template <class T>
void printVectorValsInLine(string prefix, vector<T> &vect)      // Can help in testing.
{
  cout << prefix;
  for(int i=0; i<vect.size(); i++)
    cout << "\t" << vect[i];
  cout << endl;
}
 
void main()
{
  ifstream in("A-small-practice.in");      // A-tiny-practice.in    // A-small-practice.in    // A-large-practice.in
  ofstream outfile("A-small-practice.out");
              //if(!in || in.eof()) {  cerr << "invalid file";  return;  }
              //if(!outfile) { cerr << invalid file"; return; }     // ERROR CHECKING (DON'T NEED).
  string line;
  vector<long long> xVect;
  vector<long long> yVect;
 
  getline(in, line);
  int numCases = atoi(line.c_str());
 
  for(int c=0; c<numCases; c++)
  {
    getline(in, line);
    int numVectors = atoi(line.c_str());
    getline(in, line);
    xVect = splitStringIntoLongLong(line, ' ');
    getline(in, line);
    yVect = splitStringIntoLongLong(line, ' ');
 
    sort(xVect.begin(), xVect.end());
    sort(yVect.begin(), yVect.end());
 
    long long sum = 0;
    for(int i=0; i<numVectors; i++)
      sum += xVect[i] * yVect[numVectors-i-1];
 
    outfile << "Case #" << c+1 << ": " << sum << endl;
  }
 
  in.close();
  outfile.close();
}

WARNING: For some reason this doesn't pass the "A-small-practice.in" test, even though the output is correct (same as other algorithms) and the "A-large-practice.in" gives correct output.


GCJ 2009: Qual Round 1A. "Alien Language"

Read the problem here

Analysis

Solution (C++)


#include <iostream>
#include <vector>
#include <string>
using namespace std;
 
const int NUM_LETTERS = 26;
const int MAX_CHARS   = 50;
 
inline int charToInt(char ch) {   // Used to quickly convert our letters (a-z) into an int (0-26).
  return ((int)ch - int('a'));
}
 
int main()
{
  int L, D, N;
  string line;
 
  cin >> L;   cin >> D;   cin >> N;   // Read in first line.
 
  //## CREATE DICTIONARY AS A SIMPLE VECTOR:
  vector<string> dict(D);       // Stores our dictionary of valid words... to avoid later "charToInt" calls we could
                                // instead store each as array of bytes (where a=0), but it wouldn't be a huge speedup.
  for(int d=0; d<D; d++)        // Read all dictionary words into vector.
    cin >> dict[d];
 
  //## CREATE AND ANALYZE UNKNOWN WORD USING A
  //## BIT ARRAY OF POSSIBLE VALUES FOR EACH LETTER:
 
  bool letterOpts[MAX_CHARS][NUM_LETTERS];      // Represents a bit vector of 26 on/off values for each letter position.
 
  for(int c=0; c<N; c++)      // For each unknown word:
  {
    cin >> line;                // Read it in as a string.
 
    for(int i=0;i<L;i++)        // Reset all our letterOpts values to 0.
      for(int j=0;j<NUM_LETTERS;j++)
        letterOpts[i][j] = false;
 
    //## PROCESS UNKNOWN WORD STRING INTO AN BIT VECTOR OF LETTER OPTIONS:
 
    int  letterIdx     = 0;               // Keep track of which letter we're up to.
    bool insideBracket = false;           // Need to keep track of weather we're inside a bracket.
    for(int i=0; i<line.length(); i++)    // For each character in our encoded string (eg: "(ab)c(de)").
    {
      if(line[i] == '(') {              // If start bracket:
        insideBracket = true;
      }
      else if(line[i] == ')') {         // If end bracket:
        insideBracket = false;
        letterIdx++;
      }
      else {                              // If its a letter:
        int charVal = charToInt(line[i]);
        letterOpts[letterIdx][charVal] = true;    // Turn on the matching value.
        if(!insideBracket)
          letterIdx++;
      }
    }
 
    //## FOR EACH DICTIONARY WORD: CHECK AGAINST OUR LETTER OPTIONS
 
    int numPossWords = 0; int i;      // Stores the number of valid words matched.
    for(int d=0; d<D; d++)            // For each word in our dictionary:
    {
      for(i=0; i<L; i++) {              // For each letter in this word:
        int charVal = charToInt(dict[d][i]);
        if(letterOpts[i][charVal]==false)     // If it doesn't match a possible letter option for this position:
          break;                                  // break.
      }
      if(i==L)                          // If all letters matched:
        numPossWords++;                   // Increment our count.
    }
    cout<<"Case #"<<c+1<<": "<<numPossWords<<endl;   // Write out result.
  }
  return 0;
}


The solution above works really well, but the first solution I implemented didn't work! The solution below uses recursion to test each permutation against a hash table of dictionary words. What I had done is made an incorrect assumption of mine that "only a few letters" might be off, and thus there wouldn't be a huge number of permutations. Some of the test cases have up to 10 letters almost completely unknown (26 different possibilities) and 2610 is over 1014 (100 trillion) values, and so it's not that the solution below didn't work, it just would have taken forever to run!

//-- My recursive solution to find word permutations and compare each to the dictionary....
//-- but ultimately failed when there became a huge number of permutations!
int findValidWords(vector<vector<char>> &letterOpts, unordered_set<string> &hashDict, int i, int L, string line)
{
  if(i>=L)      // base case
    return (hashDict.count(line)) ? 1 : 0;
 
  int sum = 0;
  for(unsigned int j=0; j<letterOpts[i].size(); j++) {
    line[i] = letterOpts[i][j];
    sum += findValidWords(letterOpts, hashDict, i+1, L, line);
  }
  return sum;
}


GCJ 2009: Qual Round 1C. "Welcome to Code Jam"

Read the problem here

Analysis

Solution (C++)

#include <unordered_map>  // |-- Or could use "hash_map" - will help us quickly identify repeated letters and 
#include <string>         //     list their index positions.
#include <iostream>       // Needed for cin / cout.
#include <list>           // |-- I have used list a little below as I thought it would be handy to quickly
#include <vector>         //     delete "unreachable" letters, but seems effiecient enough without this step.
#include <iomanip>        // For setfill() and setw().
using namespace std;
 
//----------
//-- This is a structre which helps us with "dynamic programming" to remember how many
//-- possible paths we can get to each character in our input/haystack string
 
struct idxcomb {
  int idx;      // The index where a character occurs (in our haystack string).
  int comb;     // Stores a record of how many possible pathways or "combinations" we can get to
                //  this index using previous characters in our ("welcome to..") needle string.
  idxcomb(int index, int combinations) : idx(index), comb(combinations) {}
};
 
//----------
//-- Our one function here returns the total number of sequential combinations to form our
//-- needle string "b" from our haystack string "a". Example: a="tatta", b="ta" -> 4.
//-- Notice this number can get huge so we've been asked to return only the last 4 significant digits
 
int findTotalComb(string a, string b)
{
  if(a.length()==0 || b.length()==0)   // |-- Exit early if either our haystack string (a) or our
    return 0;                          // |   "welcome to..." needle string (b) is empty.
 
  //## PROCESS ALL CHARACTERS FROM OUR "HAYSTACK" STRING (a) INTO A HASH TABLE
  //## LISTING EACH UNIQUE CHARACTER THEN THE INDEXES WHERE EACH OCCURS
 
  unsigned int i,j;
  unordered_map<char,vector<int>> charOccMap;   // Occurrences of each character using hash map.
  for(i=0; i<a.length(); i++)                   // For each character in haystack:
    charOccMap[ a[i] ].push_back(i);              // Add its index against the appropriate char.
 
  //## FOR EACH CHARACTER IN OUR NEEDLE (b): SETUP A LIST OF INDEX-COMBINATION PAIRS
  //## WHICH WILL SHOW HOW MANY POSSIBLE PATHS YOU CAN GET TO THAT INDEX VIA
  //## THE PREVIOUS CHARACTERS
 
  vector< list<idxcomb> > charOcc(b.length());  // Gives a list of index-combination pair for each char in our needle.
 
  for(i=0; i<b.length(); i++)                     // For each character in our needle:
  {
    unordered_map<char,vector<int>>::iterator it = charOccMap.find(b[i]);
                                                  // Finds the same character in our map (if it exists) 
    if(it == charOccMap.end())                    // If there are no occurrences of this character:
      return 0;                                       // There can't be any combinations.
    for(j=0; j<it->second.size(); j++)             // Otherwise we populate a list showing all its index positions.
      charOcc[i].push_back(idxcomb(it->second[j], i==0 ? 1 : 0)); // Notice we initialize possible "combinations"
  }                                                               //  as 1 for the 1st character and 0 for all others.
 
  //## FOR EACH NEEDLE CHARACTER, COMPARE EACH INDEX POSITION TO THE INDEX POSITIONS
  //## OF THE NEXT NEEDLE CHARACTER, AND TALLY UP THE NUMBER OF "POSSIBLE" 
 
  list<idxcomb>::iterator itC, itN;          // Just some iterators we'll use.
 
  for(i=0; i<charOcc.size()-1; i++)          // For all characters except the last one.
  {
    list<idxcomb> &currCh = charOcc[i];         // We want to look at the indexes of the current
    list<idxcomb> &nextCh = charOcc[i+1];       // character and next character.
 
    for(itC = currCh.begin(); itC != currCh.end(); itC++){    // For each index in our current char
    for(itN = nextCh.begin(); itN != nextCh.end(); itN++)     // Check each index in our next char.
    {
      if((*itN).idx > (*itC).idx)                               // If the second char's index is higher:
        (*itN).comb = ((*itN).comb + (*itC).comb) % 10000;        // Increase its tally of combinations.
    }}              // NOTE: we perform modulus 10000 because we want the last 4 digits
  }
 
  //## TALLY ALL THE COMBINATIONS POSSIBLE TO GET TO THE LAST CHARACTER
  //## IN OUR NEEDLE AND RETURN THAT AS THE RESULT:
 
  int numCombs = 0;
  for(itC = charOcc.back().begin(); itC != charOcc.back().end(); itC++)  // For each occurance of last character:
    numCombs = (numCombs + (*itC).comb) % 10000;                            // Tally possible combinations.
 
  return numCombs;    // Will return the (last four digits worth) of the total combinations.
}
 
 
int main()
{
  string needle   = "welcome to code jam";    // The string we want to form.
  string haystack;                            // The string we want to search for these letters.
 
  int cases;                                  // |-- Get number of cases
  cin >> cases;    getline(cin, haystack);    // |   then skip first line.
 
 
  cout << setfill('0') << setiosflags(ios::right);    // Tells cout to do padding with '0'.
                                                      // with our number on the right.
  for(int c=1; c<=cases; c++)   // for each case:
  {
    getline(cin, haystack);                           // Get the string.
    int result = findTotalComb(haystack, needle);     // Compute the possible permutations (last 4 digits).
    cout << "Case #" << c << ": " << setw(4) << result << endl;   // Pads results to 4 characters (eg: 0056).
  }
  return 0;
}





GCJ 2010: Problem B. "Reverse Words"

Read the problem here

Analysis

Solution (C++)

#include <string>
#include <iostream>
using namespace std;
 
void reverseSubstr(string &str, int s, int e)   //-- Reverse all the characters between the start (s) and end (e) positions.
{
  for(; s<e; s++, e--)         // Until our two indexes pass each other,
    swap(str[s], str[e]);      // we keep swapping characters and moving our indexes closer.
}
 
void reverseWords(string &str)      //-- Reverse all the (space seperated) words in a string.
{
  int n = str.length();
  int idxSt = 0;                      // Index where our word starts.
  for(int i=1; i<n; i++) {            // For each character in word:
    if(str[i]==' ') {                 // If end of a word found:
      reverseSubstr(str,idxSt, i-1);      // Reverse the characters in this word.
      idxSt = i+1;                        // Update the start of the next word.
    }
  }
  reverseSubstr(str,idxSt, n-1);      // Reverse the characters of the very last word (terminated by end of string).
  reverseSubstr(str, 0, n-1);         // Finish by reversing the entire string.
}
 
int main()
{
  int cases;
  string line;
  cin >> cases; getline(cin, line);   // Get number of cases (and advance getline).
 
  for(int c=0; c<cases; c++) {              // For each case:
    getline(cin, line);                     // Get line.
    reverseWords(line);                     // Perform reverse words,
    cout<<"Case #"<<c+1<<": "<<line<<endl;  // and write out.
  }
  return 0;
}



See Also


Links