Flex Search on Google App Engine

This entry is part of 3 in the series Flex on GAE

2. Google App Engine

The first time I visited James Ward’s blog, there was a reference to the NLJUG presentation, and an older post about a Flex application, he made. It sends an image to Google App Engine to be enhanced there. Google App Engine supports Python, minus modules and functionality that Google deems unsafe. Quite understandable, considering the kind of environment we are dealing with, and also very annoying at times.

The Django framework is included in the GAE SDK. Google App Engine uses their BigTable for data storage. BigTable is not a relational database. Google’s documentation warns that this could be a problem. On the other hand the Django template engine wasn’t changed at all.

2.1. Configuration file

App Engine applications are configured by the app.yaml configuration file. The format is YAML – Yet Another Markup Language. YAML, like JSON, is a lightweight alternative to XML that organizes data using common data structures.

application: ivanidris1
version: 1
runtime: python
api_version: 1

handlers:
- url: /stylesheets
  static_dir: stylesheets
- url: /static
  static_dir: static
- url: /search.py
  script: search.py
- url: /.*
  script: main.py

app.yaml for the FlexSearch application

As you can see app.yaml lists versions and URL mappings. The colon in YAML separates keys from values in hashes. New list items are indicated by dashes. In this example configuration file, you can see that the handlers hash has a list value, which in turn contains hashes that further specify a handler. The evaluation is in fall through fashion, from top to bottom. There are two types of handlers: Python scripts and static ones. I use the latter mainly for style sheets and Flex release files.

2.2. main.py

main.py is the main handler of FlexSearch. Here I am using the webapp framework provided with Google App Engine.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import os
from google.appengine.ext.webapp import template
 
class MainPage(webapp.RequestHandler):
  def get(self):
    page = self.request.get('page')
 
    if page == '':
        page = 'search'
 
    template_values = {
      'search_url': 'static/search/bin-release/FlexSearch.html',
      'search_url_linktext': 'Search',
      'page': page
      }
 
 
    if page == 'search' or page == '':
        path = os.path.join(os.path.dirname(__file__), 'nav.html')
        self.response.out.write(template.render(path, template_values))
 
application = webapp.WSGIApplication(
                                     [('/', MainPage)],
                                     debug=True)
 
def main():
  run_wsgi_app(application)
 
if __name__ == "__main__":
  main()

main.py source

The code is for a large part based on the docs from Google. The three mandatory parts are present – first a RequestHandler class that processes requests and builds responses, second a WSGIApplication instance that routes incoming requests to handlers based on the URL, third a main routine that runs the WSGIApplication using a CGI adaptor. The get method in MainPage handles HTTP GET requests. On successful requests a navigation menu is rendered by the Django template engine together with a small HTML fragment containing a short explanation about FlexSearch.

ivanidris1.appspot.com start page



    <a href="{{ search_url }}">{{ search_url_linktext }}</a>

Everything between {{ and }} in the navigation HTML page is a variable, and will be replaced. You can do more fancy things comparable to JSP, PHP, ASP and others. Personally, I don’t like to mix HTML with code, because it reduces readability and makes it difficult to refactor.

2.3. search.py

search.py gets called by the Flex client. It queries Google with the Google AJAX Search API. The simplejson module from Django is used for parsing. I do the JSON decoding and encoding with the simplejson loads and dumps method. search.py sends back a filtered summary of the search result in the JSON format, containing the titles and urls of the found items.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from google.appengine.api import urlfetch
import cgi
from django.utils import simplejson
from net.ivanidris.search import SearchRequest
import urllib
 
form = cgi.FieldStorage()
query = urllib.quote(form.getfirst("q", ""))
 
def fetchPageResult(start):
    request = SearchRequest.SearchRequest(url + "&start=" + start)
    resultList.extend(request.fetch())
 
 
resultList = []
startsList = []
url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&q=" + query
 
print 'Content-type: text/plain'
print ''
 
result = urlfetch.fetch(url)
 
if result.status_code == 200:
    json = simplejson.loads(result.content.strip())
 
    if len(json['responseData']['results']) > 0:
	    for searchResult in json['responseData']['results']:
	        filteredDict = dict(title = searchResult['titleNoFormatting'],
	            url = searchResult['unescapedUrl'])
	        resultList.append(filteredDict)
 
	    for pages in json['responseData']['cursor']['pages']:
	        startsList.append(pages['start'])
 
	    startsList.pop(0)
 
for start in startsList:
    fetchPageResult(start)
 
jsonString = simplejson.dumps( resultList )
print jsonString.strip()

search.py source

Some implementation details, I request “large” resultsets. Large means eight items currently. The Search API keeps track of the pagination with a cursor. I encapsulate requesting a single page of results in SearchRequest.py.

1
2
3
4
5
{"responseData": 
   {"results":
      [{"GsearchResultClass":"GwebSearch",
        "unescapedUrl":"http://ivanidris.net/",
        "url":"http://ivanidris.net/","visib ...

JSON format of the Google AJAX API search result.

2.4. SearchRequest.py

The SearchRequest class has a fetch method. The Google App Engine contains a URL Fetch API in charge of retrieving data using HTTP requests. My fetch method calls the fetch function of the URL fetching API.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from google.appengine.api import urlfetch
from django.utils import simplejson
 
class SearchRequest:
    def __init__(self, url):
        self.url = url
        self.resultList = []
 
    def fetch(self):
        result = urlfetch.fetch(self.url)
 
        if result.status_code == 200:
            json = simplejson.loads(result.content.strip())
 
            for searchResult in json['responseData']['results']:
                filteredDict = dict(title = searchResult['titleNoFormatting'],
                    url = searchResult['unescapedUrl'])
                self.resultList.append(filteredDict)
 
        return self.resultList

SearchRequest.py source

At the end, I filter out the results I don’t need and return a list of dictionaries, having as keys the title and URL of the corresponding search item.

1
2
[{"url": "http:\/\/ivanidris.net\/", "title": "Ivan Idris::ivanidris.net"}, 
 {"url": "http:\/\/ivanidris.net\/wordpress\/", "title": "Ivan Idr...

JSON output of search.py

2.5 FlexApps Python project

I created a Python project, I called FlexApps in Eclipse. If Eclipse is your main IDE, I recommend doing Python projects with the help of the Pydev plugin. Additionally, Pydev allows you to develop Jython applications. To install the plugin, use the Eclipse update manager: http://pydev.sourceforge.net/updates/. My .pydevproject file:

pydev thumb



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version="1.0" encoding="UTF-8"?>
<?eclipse-pydev version="1.0"?>
 
<pydev_project>
<pydev_pathproperty name="org.python.pydev.PROJECT_SOURCE_PATH">
<path>/FlexApps/src</path>
</pydev_pathproperty>
<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.5</pydev_property>
<pydev_pathproperty name="org.python.pydev.PROJECT_EXTERNAL_SOURCE_PATH">
<path>/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine</path>
<path>/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/django</path>
<path>/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webob</path>
<path>/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/yaml/lib</path>
</pydev_pathproperty>
</pydev_project>

Google App Engine requires Python 2.5. There are a number of Python libraries you need to put on your path. The story wouldn’t be complete without showing you my awesome FlexApps.launch file.

1
2
3
4
5
6
7
8
9
10
11
<?xml version="1.0" encoding="UTF-8"?>
   <launchConfiguration type="org.python.pydev.debug.regularLaunchConfigurationType">
   <listAttribute key="org.eclipse.debug.core.MAPPED_RESOURCE_PATHS"/>
   <listAttribute key="org.eclipse.debug.core.MAPPED_RESOURCE_TYPES"/>
   <booleanAttribute key="org.eclipse.debug.core.appendEnvironmentVariables" value="true"/>
   <stringAttribute key="org.eclipse.ui.externaltools.ATTR_LOCATION" value="/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/dev_appserver.py"/>
   <stringAttribute key="org.eclipse.ui.externaltools.ATTR_OTHER_WORKING_DIRECTORY" value=""/>
   <stringAttribute key="org.eclipse.ui.externaltools.ATTR_TOOL_ARGUMENTS" value="&quot;${project_loc}/src&quot; --port=8080 --debug --debug_imports"    />
   <stringAttribute key="org.python.pydev.debug.ATTR_INTERPRETER" value="/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/Contents/MacOS/Python"/>
  <stringAttribute key="org.python.pydev.debug.ATTR_PROJECT" value="FlexApps"/>
  </launchConfiguration>

This launch configuration file starts the test server on port 8080 in run or debug mode.

Series Navigation
0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Share
This entry was posted in programming and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">